Vector processor architecture and methods performed therein
First Claim
Patent Images
3. A vector processor for providing both vector processing and superscalar register processing, comprising:
- a plurality of vector element slices, each comprising a plurality of functional units;
a plurality of instruction decoders, each associated with a functional unit of one of said vector element slices, for providing instructions to an associated functional unit;
a vector instruction router for routing a vector instruction to all instruction decoders associated with functional units used by said vector instruction; and
a register instruction router for routing a register instruction to instruction decoders associated with a vector element slice and functional units associated with said register instruction.
0 Assignments
0 Petitions
Accused Products
Abstract
A novel vector processor architecture, and hardware and processing features associated therewith, provide both vector processing and superscalar processing features.
516 Citations
86 Claims
-
3. A vector processor for providing both vector processing and superscalar register processing, comprising:
-
a plurality of vector element slices, each comprising a plurality of functional units;
a plurality of instruction decoders, each associated with a functional unit of one of said vector element slices, for providing instructions to an associated functional unit;
a vector instruction router for routing a vector instruction to all instruction decoders associated with functional units used by said vector instruction; and
a register instruction router for routing a register instruction to instruction decoders associated with a vector element slice and functional units associated with said register instruction.
-
-
4. A method in a vector processor for creating Very Long Instruction Words (VLIW) from component instructions, comprising:
-
fetching a set of instructions from an instruction stream, the instruction stream comprising VLIW component instructions;
identifying said VLIW component instructions according to their respective functional units;
determining a group of VLIW component instructions that may be assigned to a single VLIW; and
,assigning the component instructions of the group to a specific positions of a VLIW instruction according to their respective functional units. - View Dependent Claims (5, 6, 7, 8, 9)
-
-
10. A method of designing a vector processor that forms Very Long Instruction Words (VLIW) from VLIW component instructions of an instruction stream, comprising:
-
defining a set of VLIW component instructions, each component instruction being associated with a functional unit of the vector processor;
defining grouping rules for VLIW component instructions that associate component instructions that may be executed in parallel; and
,defining associations between VLIW component instructions and specific positions of a VLIW instruction based on the functional unit of the component instruction.
-
-
11. A vector processor that forms Very Long Instruction Words (VLIW) from VLIW component instructions of an instruction stream, comprising:
-
a plurality of vector element slices, each comprising a plurality of functional units;
a plurality of instruction decoders, each associated with a functional unit of one of said vector element slices, for providing instructions to an associated functional unit;
a plurality of routers, each associated with a type of said functional units, for routing instructions to a decoder associated with a functional unit of the routed instruction;
a plurality of pipeline registers, each corresponding to a type of said functional units, for storing instructions provided by instruction decoders corresponding to the same type of functional unit, and a plurality of instruction grouping decoders, for receiving instructions from an instruction stream and providing groups of VLIW component instructions of said stream to said plurality of routers, wherein a VLIW instruction is comprised of instructions stored in respective pipeline registers.
-
-
12. A method to deliver an instruction window, comprising a set of instructions, to a superscalar instruction decoder comprising:
-
fetching two adjacent lines of instructions that together contain a set of instructions to be delivered to the superscalar instruction decoder, each of said lines being at least the size of the set of instructions to be delivered; and
,reordering the positions of instructions of the two adjacent lines so as to position first and subsequent elements of the set of instructions to be delivered into first and subsequent positions corresponding to first and subsequent positions of the superscalar instruction decoder. - View Dependent Claims (13, 14)
-
-
15. A method to deliver a set of instructions to a superscalar instruction decoder comprising:
-
obtaining a line of instructions containing at least a set of instructions to be provided to the superscalar instruction decoder;
providing the line of instructions to a rotator network along with a starting position of said set of instructions within the line, the rotator network having respective outputs coupled to inputs of a superscalar instruction decoder; and
,controlling the rotator network in accordance with the starting position of the set of instructions to output the first and subsequent instructions of the set of instructions to first and subsequent inputs of the superscalar decoder.
-
-
16. A method to deliver an instruction window, comprising a set of instructions, to a superscalar instruction decoder comprising:
-
obtaining at least a portion of a first line of instructions containing at least a portion of a set of instructions to be delivered to the superscalar instruction decoder;
obtaining at least a portion of a second line of instructions containing at least a remaining portion of said set of instructions;
providing the first and second lines of instructions to a rotator network along with a starting position of said set of instructions, the rotator network having respective outputs coupled to inputs of a superscalar instruction decoder; and
,controlling the rotator network in accordance with the starting position of the set of instructions to output the first and subsequent instructions of the set of instructions to first and subsequent inputs of the superscalar decoder. - View Dependent Claims (17, 18)
-
-
19. An apparatus for providing instruction windows, comprising sets of instructions, to a superscalar instruction decoder, comprising:
-
a memory storing lines of superscalar instructions;
a rotator for receiving at least portions of two lines of superscalar instructions that together contain a set of instructions; and
a superscalar decoder having a set of inputs for receiving corresponding first and subsequent instructions of a superscalar instruction window, the rotator network providing the first and subsequent superscalar instructions of the instruction window from within the at least portions of two lines of instructions to the corresponding inputs of the superscalar decoder. - View Dependent Claims (1, 2, 20, 21)
-
-
20-1. The apparatus claimed in claim 19, wherein the rotator reorders the positions of said instructions by rotating the instructions of the at least portions of two lines within the rotator.
-
22. A method to address a memory line of a non-power of 2 multi-word wide memory in response to a linear address comprising:
-
shifting the linear address by a fixed number of bit positions; and
using high order bits of a sum of the shifted linear address and the unshifted linear address to address a memory line. - View Dependent Claims (23)
-
-
24. A method to obtain a starting position of a non-power of 2 multi-word wide memory in response to a linear address comprising:
-
shifting the linear address by a fixed number of bit positions;
adding the shifted linear address to the unshifted linear address to form an intermediate address;
retaining a subset of high order address bits of the intermediate address as a modulo index; and
,using low order address bits of the intermediate address and said modulo index in a conversion process to obtain a starting position within a selected memory line. - View Dependent Claims (25, 26)
-
-
27. A method to obtain a starting position of a non-power of 2 multi-word wide memory in response to a linear address comprising:
-
shifting the linear address by a fixed number of bit positions;
adding the shifted linear address to the unshifted linear address to form an intermediate address;
retaining a subset of low order address bits of the intermediate address as a modulo index; and
,using said modulo index in a conversion process to obtain a starting position within a selected memory line.
-
-
28. A method to obtain a starting position of a non-power of 2 multi-word wide memory in response to a linear address comprising:
-
isolating a subset of low order address bits of the linear address as a modulo index; and
,using said modulo index in a conversion process to obtain a starting position within a selected memory line.
-
-
29. A device for performing an operation on first and second operand data having respective operand formats, comprising:
-
a first hardware register specifying a type attribute representing an operand format of the first data;
a second hardware register specifying a type attribute representing an operand format of the second data;
an operand matching logic circuit determining a common operand format to be used for both of the first and second data in performing said operation based on the first type attribute of the first data and the second type attribute of the second data; and
a functional unit performing the operation in accordance with the common operand type.
-
-
30. A method of providing data to be operated on by an operation, comprising:
-
specifying an operation type attribute representing an operation format of the operation;
specifying in a hardware register an operand type attribute representing an operand format of data to be used by the operation;
determining an operand conversion to be performed on the data to enable performance of the operation in accordance with said operation format based on said operation format and the operand format of the data; and
performing the determined operand conversion. - View Dependent Claims (31, 32, 33, 34)
-
-
35. A method in a computer for providing an operation that is independent of data operand types, comprising:
-
specifying in a hardware register an operation type attribute representing an operation format;
specifying in a hardware register an operand type attribute representing a data operand format; and
,performing said operation in a functional unit of the computer in accordance with the specified operation type attribute and the specified operand type attribute. - View Dependent Claims (36, 37)
-
-
38. A method in a computer for providing an operation that is independent of data operand type, comprising:
-
specifying in a hardware register an operand type attribute representing a data operand format of said data operand; and
,performing said operation in a functional unit of the computer in accordance with the specified operand type attribute.
-
-
39. A method in a computer for providing an operation that is independent of data operand types, comprising:
-
specifying in a first hardware register an operand type attribute representing an operand format of a first data operand;
specifying in a second hardware register an operand type attribute representing an operand format of a second data operand;
determining in an operand matching logic circuit a common operand format to be used for both of the first and second data in performing said operation based on the first type attribute of the first data and the second type attribute of the second data;
performing said operation in a functional unit of the computer in accordance with the determined common operand.
-
-
40. A method for performing operand conversion in a computer device, comprising:
-
specifying in a hardware register an original operand type attribute representing an original operand format of operand data;
specifying in a hardware register a converted operand type attribute representing a converted operand format to which the operand data is to be converted; and
,converting the data from the original operand format to the converted operand format in an operand format conversion logic circuit in accordance with the original operand type attribute and the converted operand type attribute. - View Dependent Claims (41, 42, 43, 44, 45, 46, 47)
-
-
48. A method to conditionally perform operations on elements of a vector, comprising:
-
generating a vector enable mask comprising a plurality of bits, each bit corresponding to a respective element of a vector;
generating a vector conditional mask comprising a plurality of bits, each bit corresponding to a respective element of a vector; and
,for each of said elements, applying logic to the vector enable mask bit and vector conditional mask bit that correspond to that element to determine if an operation is to be performed for that element.
-
-
49. The method claimed in claim 49, wherein said logic requires the vector enable bit corresponding to an element to be set to enable an operation on the corresponding element to be performed.
-
50. A method to nest conditional controls for elements of a vector comprising:
-
generating a vector enable mask comprising a plurality of bits, each bit corresponding to a respective element of a vector;
generating a vector conditional mask comprising a plurality of bits, each bit corresponding to a respective element of a vector;
saving the vector enable mask to a temporary storage location;
generating a nested vector enable mask comprising a logical combination of the vector enable mask with the vector conditional mask; and
using the nested vector enable mask as a vector enable mask for a subsequent vector operation. - View Dependent Claims (51, 52, 53, 54)
-
-
55. A method to nest conditional controls for elements of a vector comprising:
-
generating a vector enable mask comprising a plurality of bits, each bit corresponding to a respective element of a vector;
generating a vector conditional mask comprising a plurality of bits, each bit corresponding to a respective element of a vector;
saving the vector enable mask to a temporary storage location;
generating a nested vector enable mask by performing a bitwise “
and”
of the vector enable mask with the vector conditional mask; and
using the nested vector enable mask as a vector enable mask for a subsequent vector operation.
-
-
56. A method to nest conditional controls for elements of a vector comprising:
-
generating a vector enable mask comprising a plurality of bits, each bit corresponding to a respective element of a vector;
generating a vector conditional mask comprising a plurality of bits, each bit corresponding to a respective element of a vector;
saving the vector enable mask to a temporary storage location;
generating a nested vector enable mask by performing a bitwise “
and”
of the vector enable mask with a bitwise “
not”
of the vector conditional mask; and
using the nested vector enable mask as a vector enable mask for a subsequent vector operation.
-
-
57. A method to improve responsiveness to program control operations in a processor with a long pipeline comprising:
-
providing a separate computational unit designed for program control operations;
positioning said separate computational unit early in the pipeline thereby reducing delays; and
,using said separate computation unit to produce a program control result early in the pipeline to control the execution address of a processor.
-
-
58. A method to improve the responsiveness to an operand address computation in a processor with a long pipeline comprising:
-
providing a separate computational unit designed for operand address computations;
positioning said separate computational unit early in the pipeline thereby reducing delays; and
,using said separate computation unit to produce a result early in the pipeline to be used as an operand address.
-
-
59. A vector processor comprising:
-
a vector of multipliers computing multiplier results; and
an array adder computational unit computing an arbitrary linear combination of said multiplier results. - View Dependent Claims (60, 61, 62, 63)
-
-
64. A device for providing an indication of a processor attempt to access an address yet to be loaded or stored, comprising:
-
a current bulk transfer address register storing a current bulk transfer address;
an ending bulk transfer address register storing an ending bulk transfer address;
a comparison circuit coupled to the current bulk transfer address register and the ending bulk transfer address register, and to said processor, to provide a signal to the processor indicating whether an address received from the processor is between the current bulk transfer address and the ending bulk transfer address. - View Dependent Claims (65, 66)
-
-
67. A device for providing an indication of a processor attempt to access an address yet to be loaded or stored, comprising:
-
a current bulk transfer address register storing a current bulk transfer address;
a comparison circuit coupled to the current bulk transfer address register and to the processor to provide a signal to the processor indicating whether a difference between the current bulk transfer address and an address received from the processor is within a specified stall range. - View Dependent Claims (68, 69)
-
-
70. A method of controlling processing in a vector processor, comprising:
-
receiving an instruction to perform a vector operation using one or more vector data operands; and
determining a number of vector data elements of the one or more vector data operands to be processed by the vector operation based on a number of vector data elements that constitute each vector data operand and a number of hardware elements available to perform the vector operation.
-
-
71. A method of controlling processing in a vector processor, comprising:
-
receiving instructions to perform a plurality of vector operations, each vector operation using one or more vector data operands;
for each of the plurality of vector operations, determining a number of vector data elements of each of the one or more vector data operands to be processed by the vector operation based on a number of vector data elements that constitute each vector data operand of the operation and a number of hardware elements available to perform the vector operation; and
determining a number of vector data elements to be processed by all of said plurality of operations by comparing the number of vector data elements to be processed for each respective vector operation.
-
-
72. A method in a vector processor to perform a vector operation on all data elements of a vector, comprising:
-
setting a 1 op counter to a number of vector data elements to be processed;
performing one or more vector operations on vector data elements of said vector;
determining a number of vector data elements processed by said vector operations;
subtracting the number of vector data elements processed from the loop counter;
determining, after said subtraction, whether additional vector data elements remain to be processed; and
if additional vector data elements remain to be processed, performing further vector operations on remaining data elements of said vector. - View Dependent Claims (73)
-
-
74. A method in a vector processor to reduce a number of operations performed for a last iteration of a processing loop, comprising:
-
setting a loop counter to a number of vector data elements to be processed;
performing one or more vector operations on data elements of said vector;
determining a number of vector data elements processed by said vector operations;
subtracting the number of vector data elements processed from the loop counter;
determining, after said subtraction, whether additional vector data elements remain to be processed; and
if additional vector data elements remain to be processed, and the number of additional vector data elements to be processed is less than a full vector of data elements, reducing one of available elements used to perform said vector operations and vector data elements available for the last loop iteration.
-
-
75. A method of controlling processing in a vector processor, comprising:
-
performing one or more vector operations on data elements of a vector;
determining a number of data elements processed by said vector operations; and
updating an operand address register by an amount corresponding to the number of data elements processed.
-
-
76. A method of performing a loop operation, comprising:
-
storing, in a match register, a value to be compared to a monitored register;
designating a register as said monitored register;
comparing the value stored in the match register with a value stored in the monitored register, and responding to a result of said comparison in accordance with a program-specified condition by one of branching or repeating a desired sequence of program instructions, thereby forming a program loop. - View Dependent Claims (77, 78, 79)
-
-
80. A method of processing interrupts in a superscalar processor, comprising:
-
monitoring an interrupt line for a signal indicating an interrupt to the superscalar processor;
upon detection of an interrupt signal, fetching a group of instructions to be executed in response to the interrupt, and inhibiting in hardware an address update of a program counter; and
executing the group of instructions. - View Dependent Claims (81)
-
-
82. A method in a vector processor, comprising:
-
receiving an instruction;
determining whether a vector satisfies a condition specified in the instruction; and
if the vector satisfies the condition specified in the instruction, branching to a new instruction. - View Dependent Claims (83)
-
-
84. A method of providing a vector of data as a vector processor operand, comprising:
-
obtaining a line of data containing at least a vector of data to be provided as the vector processor operand;
providing the line of data to a rotator network along with a starting position of said vector of data within the line, the rotator network having respective outputs coupled to vector processor operand data inputs; and
,controlling the rotator network in accordance with the starting position of the vector of data to output the first and subsequent data elements of the vector of data to first and subsequent operand data inputs of the vector processor.
-
-
85. A method of providing a vector of data as a vector processor operand, comprising:
-
obtaining at least a portion of a first line of vector data containing at least a portion of a vector processor operand;
obtaining at least a portion of a second line of vector data containing at least a remaining portion of said vector processor operand;
providing the at least a portion of said first line of vector data and the at least a portion of said second line of vector data to a rotator network along with a starting position of said vector data, the rotator network having respective outputs coupled to vector processor operand data inputs; and
,controlling the rotator network in accordance with the starting position of the vector data to output the first and subsequent vector data elements to first and subsequent operand data inputs of the vector processor.
-
-
86. A method to read a vector of data for a vector processor operand comprising:
-
reading into a local memory device a series of lines from a larger memory;
obtaining from said local memory device at least a portion of a first line containing a portion of a vector processor operand;
obtaining from said local memory device at least a portion of a second line containing a remaining portion of said vector processor operand;
providing the at least a portion of said first line of vector data and the at least a portion of said second line of vector data to a rotator network along with a starting position of said vector data, the rotator network having respective outputs coupled to vector processor operand data inputs; and
,controlling the rotator network in accordance with the starting position of the vector data to output first and subsequent vector data elements to first and subsequent vector processor operand data inputs.
-
Specification