Computation engine with strided dot product
First Claim
Patent Images
1. A system comprising:
- a processor configured to issue a first instruction to a computation engine;
the computation engine coupled to the processor, wherein;
the computation engine comprises;
a first memory storing, during use, a first plurality of input vectors that include first vector elements, anda second memory storing, during use, a second plurality of input vectors that include second vector elements; and
the computation engine is configured, in response to the first instruction, to compute a dot product of a subset of the first vector elements and each of the second vector elements, wherein respective elements of the subset of the first vector elements are separated in the first plurality of input vectors by other elements not in the subset, wherein a number of the other elements is specified by a stride corresponding to the first instruction, and wherein the computation engine is further configured, in response to the first instruction, not to apply the dot product to the first vector elements that are not in the subset.
1 Assignment
0 Petitions
Accused Products
Abstract
In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.
-
Citations
20 Claims
-
1. A system comprising:
-
a processor configured to issue a first instruction to a computation engine; the computation engine coupled to the processor, wherein; the computation engine comprises; a first memory storing, during use, a first plurality of input vectors that include first vector elements, and a second memory storing, during use, a second plurality of input vectors that include second vector elements; and the computation engine is configured, in response to the first instruction, to compute a dot product of a subset of the first vector elements and each of the second vector elements, wherein respective elements of the subset of the first vector elements are separated in the first plurality of input vectors by other elements not in the subset, wherein a number of the other elements is specified by a stride corresponding to the first instruction, and wherein the computation engine is further configured, in response to the first instruction, not to apply the dot product to the first vector elements that are not in the subset. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A circuit comprising:
-
a first input memory storing a first plurality of input vectors, during use; a second input memory storing a second plurality of input vectors, during use; and a compute circuit coupled to the first input memory and the second input memory, wherein the compute circuit is configured, responsive to a first instruction, to multiply selected vector elements of the first plurality of input vectors by the second plurality of input vectors, wherein the selected vector elements are separated in the first plurality of input vectors by unselected vector elements of the first plurality of input vectors, wherein a number of the unselected vector elements is specified by a stride associated with the first instruction, and wherein the compute circuit is configured, responsive to the first instruction, not to multiply the unselected vector elements of the first plurality of input vectors by the second plurality of input vectors. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A system comprising:
-
a processor configured to issue a first instruction to a computation engine; the computation engine coupled to the processor, wherein; the computation engine comprises; a first memory storing, during use, a first plurality of input vectors that include first vector elements, a second memory storing, during use, a second plurality of input vectors that include second vector elements, and a third memory storing, during use, a plurality of results; and the computation engine further comprises a plurality of multiply accumulate (MAC) circuits, wherein the plurality of MAC circuits are configured to multiply selected first vector elements by second vector elements to generate multiplication results and to add the multiplication results to the plurality of results, and the computation engine performs the multiplications and additions in response to the first instruction, and wherein the selected first vector elements are identified using a stride corresponding to the first instruction and the selected first vector elements are separated in the first plurality of input vectors by non-selected vector elements, wherein a number of the non-selected vector elements is specified by the stride, and wherein the computation engine is configured, in response to the first instruction, not to operate upon non-selected first vector elements of the first plurality of input vectors. - View Dependent Claims (19, 20)
-
Specification