Multithreaded programmable processor and system with partitioned operations
First Claim
Patent Images
1. A programmable processor comprising:
- a data path capable of transmitting data;
an external interface operable to receive data from an external source and communicate the received data over the data path;
a register file containing a plurality of registers each having a register width, the register file coupled to the data path and configured to support processing of a plurality of threads and to store a plurality of multiple-bit data elements in partitioned fields, each of the multiple-bit data elements having an elemental width smaller than the register width;
an execution unit coupled to the data path, the execution unit configured to execute a plurality of instruction streams from the plurality of threads in a multistage pipeline such that the multistage pipeline is capable of including instructions from different ones of the instruction streams in different stages of the multistage pipeline, each instruction stream including a single arithmetic instruction that specifies an arithmetic operation to cause multiple instances of the arithmetic operation to be performed, each instance of the arithmetic operation to be performed using a different one of the plurality of multiple-bit data elements in partitioned fields of at least one of the registers to produce a catenated result, the single arithmetic instruction causing a plurality of multiple-bit data elements in partitioned fields to be read in parallel from a register included in the register file, and causing the catenated result to be written in parallel to one of the registers included in the register file; and
wherein each of the multiple-bit data elements has an elemental width, and the data path has a data path width multiple times greater than the elemental width, to allow multiple-bit data elements used for the multiple instances of the arithmetic operation to be transmitted in parallel from the register file to the execution unit, and wherein the execution unit is operable to receive, in parallel, multiple-bit data elements for the multiple instances of the arithmetic operation and execute the multiple instances of the single arithmetic instruction to produce the catenated result.
0 Assignments
0 Petitions
Accused Products
Abstract
A programmable processor and method for improving the performance of processors by incorporating an execution unit configurable to execute a plurality of instruction streams from the plurality of threads, wherein each instruction stream includes a group instruction that operates on a plurality of data elements in partitioned fields of at least one of the registers to produce a catenated result.
174 Citations
28 Claims
-
1. A programmable processor comprising:
-
a data path capable of transmitting data; an external interface operable to receive data from an external source and communicate the received data over the data path; a register file containing a plurality of registers each having a register width, the register file coupled to the data path and configured to support processing of a plurality of threads and to store a plurality of multiple-bit data elements in partitioned fields, each of the multiple-bit data elements having an elemental width smaller than the register width; an execution unit coupled to the data path, the execution unit configured to execute a plurality of instruction streams from the plurality of threads in a multistage pipeline such that the multistage pipeline is capable of including instructions from different ones of the instruction streams in different stages of the multistage pipeline, each instruction stream including a single arithmetic instruction that specifies an arithmetic operation to cause multiple instances of the arithmetic operation to be performed, each instance of the arithmetic operation to be performed using a different one of the plurality of multiple-bit data elements in partitioned fields of at least one of the registers to produce a catenated result, the single arithmetic instruction causing a plurality of multiple-bit data elements in partitioned fields to be read in parallel from a register included in the register file, and causing the catenated result to be written in parallel to one of the registers included in the register file; and wherein each of the multiple-bit data elements has an elemental width, and the data path has a data path width multiple times greater than the elemental width, to allow multiple-bit data elements used for the multiple instances of the arithmetic operation to be transmitted in parallel from the register file to the execution unit, and wherein the execution unit is operable to receive, in parallel, multiple-bit data elements for the multiple instances of the arithmetic operation and execute the multiple instances of the single arithmetic instruction to produce the catenated result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A programmable processor comprising:
-
a data path capable of transmitting data; an external interface operable to receive data from an external source and communicate the received data over the data path; first and second register files containing a plurality of registers each having a register width, the first and second register files coupled to the data path and configured to support processing of first and second threads, respectively, and to store a plurality of multiple-bit data elements in partitioned fields, each of the multiple-bit data elements having an elemental width smaller than the register width; an execution unit coupled to the data path, the execution unit configured to execute first and second instruction streams from the first and second threads, respectively, in a multistage pipeline such that the multistage pipeline is capable of including instructions from different ones of the instruction streams in different stages of the multistage pipeline, the first and second instruction streams each including a single arithmetic instruction that specifies an arithmetic operation to cause multiple instances of the arithmetic operation to be performed, each instance of the arithmetic operation to be performed using a different one of multiple-bit data elements in partitioned fields of at least one of the registers to produce a catenated result, the single arithmetic instruction causing a plurality of multiple-bit data elements in partitioned fields to be read in parallel from a register included in the register file, and causing the catenated result to be written in parallel to one of the registers included in the register file; and wherein each of the multiple-bit data elements has an elemental width, and the data path has a data path width multiple times greater than the elemental width, to allow multiple-bit data elements used for the multiple instances of the arithmetic operation to be transmitted in parallel from the first register file and from the second register file to the execution unit, and wherein the execution unit is operable to receive, in parallel, multiple-bit data elements for the multiple instances of the arithmetic operation and execute the multiple instances of the single arithmetic instruction to produce the catenated result. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A data processing system comprising:
-
(a) a bus coupling components in the data processing system; (b) an external memory coupled to the bus; (c) a programmable microprocessor coupled to the bus and capable of operation independent of another host processor, the microprocessor comprising; a data path capable of transmitting data; an external interface operable to receive data from an external source and communicate the received data over the data path; a register file containing a plurality of registers each having a register width, the register file coupled to the data path and configured to support processing of a plurality of threads and to store a plurality of multiple-bit data elements in partitioned fields, each of the multiple-bit data elements having an elemental width smaller than the register width; an execution unit coupled to the data path, the execution unit configured to execute a plurality of instruction streams from the plurality of threads in a multistage pipeline such that the multistage pipeline is capable of including instructions from different ones of the instruction streams in different stages of the multistage pipeline, each instruction stream including a single arithmetic instruction that specifies an arithmetic operation to cause multiple instances of the arithmetic operation to be performed, each instance of the arithmetic operation to be performed using a different one of the plurality of data elements in partitioned fields of at least one of the registers to produce a catenated result, the single arithmetic instruction causing a plurality of multiple-bit data elements in partitioned fields to be read in parallel from a register included in the register file, and causing the catenated result to be written in parallel to one of the registers included in the register file; and wherein each of the multiple-bit data elements has an elemental width, and the data path has a data path width multiple times greater than the elemental width, to allow multiple-bit data elements used for the multiple instances of the arithmetic operation to be transmitted in parallel from the register file to the execution unit, and wherein the execution unit is operable to receive, in parallel, multiple-bit data elements for the multiple instances of the arithmetic operation and execute the multiple instances of the single arithmetic instruction to produce the catenated result. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A data processing system comprising:
-
(a) a bus coupling components in the data processing system; (b) an external memory coupled to the bus; (c) a programmable microprocessor coupled to the bus and capable of operation independent of another host processor, the microprocessor comprising; a data path capable of transmitting data an external interface operable to receive data from an external source and communicate the received data over the data path; first and second register files containing a plurality of registers each having a register width, the first and second register files coupled to the data path and configured to support processing of first and second threads, respectively, and to store a plurality of multiple-bit data elements in partitioned fields, each of the multiple-bit data elements having an elemental width smaller than the register width; an execution unit coupled to the data path, the execution unit configured to execute first and second instruction streams from the first and second threads, respectively, in a multistage pipeline such that the multistage pipeline is capable of including instructions from different ones of the instruction streams in different stages of the multistage pipeline, the first and second instruction streams each including a single arithmetic instruction that specifies an arithmetic operation to cause multiple instances of the arithmetic operation to be performed, each instance of the arithmetic operation to be performed using a different one of the plurality of multiple-bit data elements in partitioned fields of at least one of the registers to produce a catenated result, the single arithmetic instruction causing a plurality of multiple-bit data elements in partitioned fields to be read in parallel from a register included in the register file, and causing the catenated result to be written in parallel to one of the registers included in the register file; and wherein each of the multiple-bit data elements has an elemental width, and the data path has a data path width multiple times greater than the elemental width, to allow multiple-bit data elements used for the multiple instances of the arithmetic operation to be transmitted in parallel from the first register file and from the second register file to the execution unit, and wherein the execution unit is operable to receive, in parallel, multiple-bit data elements for the multiple instances of the arithmetic operation and execute the multiple instances of the single arithmetic instruction to produce the catenated result. - View Dependent Claims (23, 24, 25, 26)
-
-
27. A programmable processor comprising:
-
a data path capable of transmitting data; an external interface operable to received data from an external source and communicate the received data over the data path; a register file containing a plurality of registers each having a register width, the register file coupled to the data path and configured to support processing of a plurality of threads and to store a plurality of multiple-bit data elements in partitioned fields, each of the multiple-bit data elements having an elemental width smaller than the register width; an execution unit coupled to the data path, the execution unit configured to execute a plurality of instruction streams from the plurality of threads in a multistage pipeline such that the multistage pipeline is capable of including instructions from different ones of the instruction streams in different stages of the multistage pipeline, each instruction stream including a single floating-point arithmetic instruction that specifies a floating-point arithmetic operation to cause multiple instances of the floating-point arithmetic operation to be performed, each instance of the floating point arithmetic operation to be performed using a different one of the plurality of multiple-bit data elements in partitioned fields of at least one of the registers to produce a catenated result, the single floating-point arithmetic instruction causing a plurality of multiple-bit data elements in partitioned fields to be read in parallel from a register included in the register file, and causing the catenated result to be written in parallel to one of the registers included in the register file; and wherein each of the multiple-bit data elements has an elemental width, and the data path has a data path width multiple times greater than the elemental width, to allow multiple-bit data elements used for the multiple instances of the floating-point arithmetic operation to be transmitted in parallel from the register file to the execution unit, and wherein the execution unit is operable to receive, in parallel, multiple-bit data elements for the multiple instances of the floating-point arithmetic operation and execute the multiple instances of the single floating-point arithmetic instruction to produce the catenated result.
-
-
28. A data processing system comprising:
-
(a) a bus coupling components in the data processing system; (b) an external memory coupled to the bus; (c) a programmable microprocessor coupled to the bus and capable of operation independent of another host processor, the microprocessor comprising; a data path capable of transmitting data; an external interface operable to receive data from an external source and communicate the received data over the data path; a register file containing a plurality of registers each having a register width, the register file coupled to the data path and configured to support processing of a plurality of threads and to store a plurality of multiple-bit data elements in partitioned fields, each of the multiple-bit data elements having an elemental width smaller than the register width; an execution unit coupled to the data path, the execution unit configured to execute a plurality of instruction streams from the plurality of threads in a multistage pipeline such that the multistage pipeline is capable of including instructions from different ones of the instruction streams in different stages of the multistage pipeline, each instruction stream including a single floating-point arithmetic instruction that specifies a floating-point arithmetic operation to cause multiple instances of the floating-point arithmetic operation to be performed, each instance of the floating-point arithmetic operation to be performed using a different one of the plurality of multiple-bit data elements in partitioned fields of at least one of the registers to produce a catenated result, the single floating-point arithmetic instruction causing a plurality of multiple-bit data elements in partitioned fields to be read in parallel from a register included in the register file, and causing the catenated result to be written in parallel to one of the registers included in the register file; and wherein each of the multiple-bit data elements has an elemental width, and the data path has a data path width multiple times greater than the elemental width, to allow multiple-bit data elements used for the multiple instances of the floating-point arithmetic operation to be transmitted in parallel from the register file to the execution unit, and wherein the execution unit is operable to receive, in parallel, multiple-bit data elements for the multiple instances of the floating-point arithmetic operation and execute the multiple instances of the single floating-point arithmetic instruction to produce the catenated result.
-
Specification