Memory-network processor with programmable optimizations
First Claim
1. An apparatus, comprising:
- an execution unit;
a fetch unit configured to receive a multi-part instruction, wherein the multi-part instruction includes a plurality of fields; and
a plurality of address generator units;
wherein a first address generator unit of the plurality of address generator units is configured to perform a first arithmetic operation for a first thread of sub-instructions dependent upon a first field of the plurality of fields and store a result of the first arithmetic operation in a register;
wherein a second address generator unit is configured to generate at least one address of a plurality of addresses, wherein each address of the plurality of addresses is dependent upon a respective field of the plurality of fields, wherein the apparatus is configured to use the at least one address to access one or more input operands for the execution unit for a second thread of sub-instructions; and
wherein the apparatus is configured to use the result of the first arithmetic operation stored in the register to access one or more input operands for the execution unit for a sub-instruction in the first thread of sub-instructions in a subsequent multi-part instruction.
3 Assignments
0 Petitions
Accused Products
Abstract
Various embodiments are disclosed of a multiprocessor system with processing elements optimized for high performance and low power dissipation and an associated method of programming the processing elements. Each processing element may comprise a fetch unit and a plurality of address generator units and a plurality of pipelined datapaths. The fetch unit may be configured to receive a multi-part instruction, wherein the multi-part instruction includes a plurality of fields. A first address generator unit may be configured to perform an arithmetic operation dependent upon a first field of the plurality of fields. A second address generator unit may be configured to generate at least one address of a plurality of addresses, wherein each address is dependent upon a respective field of the plurality of fields. A parallel assembly language may be used to control the plurality of address generator units and the plurality of pipelined datapaths.
18 Citations
15 Claims
-
1. An apparatus, comprising:
-
an execution unit; a fetch unit configured to receive a multi-part instruction, wherein the multi-part instruction includes a plurality of fields; and a plurality of address generator units; wherein a first address generator unit of the plurality of address generator units is configured to perform a first arithmetic operation for a first thread of sub-instructions dependent upon a first field of the plurality of fields and store a result of the first arithmetic operation in a register; wherein a second address generator unit is configured to generate at least one address of a plurality of addresses, wherein each address of the plurality of addresses is dependent upon a respective field of the plurality of fields, wherein the apparatus is configured to use the at least one address to access one or more input operands for the execution unit for a second thread of sub-instructions; and wherein the apparatus is configured to use the result of the first arithmetic operation stored in the register to access one or more input operands for the execution unit for a sub-instruction in the first thread of sub-instructions in a subsequent multi-part instruction. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for operating a processor, the method comprising:
-
receiving a multi-part instruction, wherein the multi-part instruction includes a plurality of fields; performing an arithmetic operation for a first thread of sub-instructions dependent on a first field of the plurality of fields and storing a result of the first arithmetic operation in a register; generating a given address of a plurality of addresses dependent upon a respective field of the plurality of fields and using the given address to access one or more input operands for an execution unit for a second thread of sub-instructions; and using the result of the first arithmetic operation stored in the register to access one or more input operands for the execution unit for a sub-instruction in the first thread of sub-instructions in a subsequent multi-part instruction. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A system, comprising:
-
a plurality of processors; and a plurality of dynamically configurable communication elements; wherein the plurality of processors and the plurality of dynamically configurable communication elements are coupled together in an interspersed arrangement; wherein a given processor of the plurality of processors is configured to; receive a multi-part instruction, wherein the multi-part instruction includes a plurality of fields; perform an arithmetic operation for a first thread of sub-instructions dependent upon a given field of the plurality of fields and store a result of the first arithmetic operation in a register; generate a plurality of addresses dependent upon a subset of the plurality of fields, wherein the given processor is configured to use the plurality of addresses to access one or more input operands for an execution unit for a second thread of sub-instructions, and use the result of the first arithmetic operation stored in the register to access one or more input operands for the execution unit for a sub-instruction in the first thread of sub-instructions in a subsequent multi-part instruction. - View Dependent Claims (12, 13, 14, 15)
-
Specification