System and method for performing compound vector operations
First Claim
1. A data processing system comprising:
- a controller;
at least one arithmetic cluster capable of independently and sequentially performing compound arithmetic operations, responsive to commands directly operatively provided from the controller, on data presented at an input thereof and providing resultant processed data at an output thereof, and capable of utilizing intermediate data generated as a result of performing the operations in subsequent operations without retrieving the intermediate data from a source external to that arithmetic cluster; and
a stream register file directly operatively coupled to the cluster and being selectively readable and writable, responsive to commands from the controller, by each of the at least one arithmetic cluster for holding the resultant processed data of the at least one arithmetic cluster.
1 Assignment
0 Petitions
Accused Products
Abstract
A processor particularly useful in multimedia applications such as image processing is based on a stream programming model and has a tiered storage architecture to minimize global bandwidth requirements. The processor has a stream register file through which the processor'"'"'s functional units transfer streams to execute processor operations. Load and store instructions transfer streams between the stream register file and a stream memory; send and receive instructions transfer streams between stream register files of different processors; and operate instructions pass streams between the stream register file and computational kernels. Each of the computational kernels is capable of performing compound vector operations. A compound vector operation performs a sequence of arithmetic operations on data read from the stream register file, i.e., a global storage resource, and generates a result that is written back to the stream register file. Each function or compound vector operation is specified by an instruction sequence that specifies the arithmetic operations and data movements that are performed each cycle to carry out the compound operation. This sequence can, for example, be specified using microcode.
109 Citations
29 Claims
-
1. A data processing system comprising:
-
a controller;
at least one arithmetic cluster capable of independently and sequentially performing compound arithmetic operations, responsive to commands directly operatively provided from the controller, on data presented at an input thereof and providing resultant processed data at an output thereof, and capable of utilizing intermediate data generated as a result of performing the operations in subsequent operations without retrieving the intermediate data from a source external to that arithmetic cluster; and
a stream register file directly operatively coupled to the cluster and being selectively readable and writable, responsive to commands from the controller, by each of the at least one arithmetic cluster for holding the resultant processed data of the at least one arithmetic cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 28, 29)
the local storage unit is connected to an input of the functional element within the arithmetic cluster; and
data stored in the local storage unit is directly accessible only by the functional element to which it is connected.
-
-
6. The system of claim 4, wherein data stored in the local storage unit is accessible by a plurality of functional elements in the arithmetic cluster containing that local storage unit and plurality of functional elements.
-
7. The system of claim 3, wherein the crossbar switch is a sparse crossbar switch.
-
8. The system of claim 2, wherein the plurality of functional elements includes a scratchpad register file.
-
9. The system of claim 2, wherein the plurality of functional elements includes an intercluster communication unit for communicating with other arithmetic clusters.
-
10. The system of claim 1, wherein an arithmetic cluster includes a local storage unit for storing data to be used by the arithmetic cluster in subsequent arithmetic operations.
-
11. The system of claim 1, further comprising a host processor capable of selectively reading and writing the stream register file.
-
12. The system of claim 11, further comprising:
a network interface connected to the stream register file for exchanging data between the stream register file and another system.
-
13. The system of claim 1, wherein the at least one arithmetic cluster is a plurality of arithmetic clusters each capable of independently and sequentially performing compound arithmetic operations, responsive to commands from the controller, on data presented at respective inputs thereof and providing resultant processed data at respective outputs thereof, and capable of utilizing intermediate data generated as a result of performing the operations in subsequent operations without retrieving the intermediate data from a source external to that arithmetic cluster.
-
14. The system of claim 1, further comprising a global storage unit being selectively readable and writable, responsive to commands from the controller, only by the stream register file.
-
15. The system of claim 14, wherein the stream register file is selectively and independently writable, responsive to the controller, by at least two of the controller, the global storage unit and an arithmetic cluster.
-
16. The system of claim 14, wherein the global storage unit is selectively readable and writable, responsive to the controller, by the stream register file in independent, simultaneous transfers.
-
28. The system of claim 1, wherein cluster instructions and at least one of data input and output streams are provided to the at least one cluster responsive to a stream instruction.
-
29. The system of claim 8, wherein the scratchpad register file is independently addressable for the cluster which it is in using a computed address.
-
17. A method of processing data comprising:
-
performing multiple arithmetic operations simultaneously and independently in each of a plurality of arithmetic clusters responsive to commands directly operatively provided from a controller, at least some of the arithmetic operations utilizing data generated and supplied by the arithmetic clusters without retrieving the generated data from a source external to the arithmetic clusters; and
reading data used by the arithmetic clusters from and writing data generated by the arithmetic clusters to a stream register file connected directly to the plurality of arithmetic clusters. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
Specification