Method and apparatus for performing multiply-add operations on packed data
First Claim
Patent Images
1. An apparatus for use in a computer system comprising:
- a memory having stored therein a first packed data and a second packed data; and
a processor coupled to said memory to receive said first packed data and said second packed data, said processor performing operations on data elements in said first packed data and said second packed data to generate a plurality of data elements in a third packed data in response to receiving an instruction, at least two of said plurality of data elements in said third packed data storing the result of multiply-add operations.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.
494 Citations
26 Claims
-
1. An apparatus for use in a computer system comprising:
-
a memory having stored therein a first packed data and a second packed data; and
a processor coupled to said memory to receive said first packed data and said second packed data, said processor performing operations on data elements in said first packed data and said second packed data to generate a plurality of data elements in a third packed data in response to receiving an instruction, at least two of said plurality of data elements in said third packed data storing the result of multiply-add operations. - View Dependent Claims (2, 3, 4)
-
-
5. An apparatus for use in a computer system comprising:
-
a first storage area; and
a circuit coupled to said first storage area, said circuit multiplying a value A by a value B to generate a first intermediate result, multiplying a value C by a value D to generate a second intermediate result, multiplying a value E by a value F to generate a third intermediate result, multiplying a value G by a value H to generate a fourth intermediate result, adding said first intermediate result to said second intermediate result to generate a value I, adding said third intermediate result to said fourth intermediate result to generate a value J, and storing said value I and said value J in said first storage area as elements of a first packed data in response to an enable signal. - View Dependent Claims (6, 7)
-
-
8. A computer system comprising:
-
a processor; and
a storage area coupled to said processor having stored therein, a multiply-add instruction for operating on a first packed data and a second packed data, said first packed data containing at least data elements A, B, C, and D each including a predetermined number of bits, said second packed data containing at least data elements E, F, G, and H each including said predetermined number of bits, said processor generating a third packed data containing at least data elements I and J in response to receiving said multiply-add instruction, said data element I equal to (A×
E)+(B×
F), said data element J equal to (C×
G)+(D×
H). - View Dependent Claims (9, 10, 11, 12, 13, 14, 18, 19, 20, 21, 22)
-
-
15. A processor comprising:
-
a first storage for storing a first packed data containing at least an A, a B, a C, and a D data element;
a second storage area for storing a second packed data containing at least an E, an F, a G, and an H data element;
a multiply-add circuit including;
a first multiplier coupled to said first storage area to receive said A and coupled to said second storage area to receive said E;
a second multiplier coupled to said first storage area to receive said B and coupled to said second storage area to receive said F;
a third multiplier coupled to said first storage area to receive said C and coupled to said second storage area to receive said G;
a fourth multiplier coupled to said first storage area to receive said D and coupled to said second storage area to receive said H;
a first adder coupled to said first multiplier and said second multiplier;
a second adder coupled to said third multiplier and said fourth multiplier; and
a third storage area coupled to said first adder and said second adder, said third storage area having at least a first field and a second field, said first field for storing the output of said first adder as a first data element of a third packed data, said second field for storing the output of said second adder as a second data element of said third packed data.
-
-
16. An apparatus for use in a computer system comprising:
-
a first storage area having at least a first field and a second field; and
a circuit, coupled to said first storage area, operating in response to a signal, said circuit comprising;
a multiplication means for multiplying a value A by a value B to generate a first intermediate result, multiplying a value C by a value D to generate a second intermediate result, multiplying a value E by a value F to generate a third intermediate result, and multiplying a value G by a value H to generate a fourth intermediate result; and
an arithmetic means for adding said first intermediate result and said second intermediate result to generate a value I, and adding said third intermediate result and said fourth intermediate result to generate a value J; and
a storage means for storing said value I in said first field and said value J in said second field as a first packed data.
-
-
17. An apparatus for use in a computer system comprising:
-
a memory having stored therein a first packed data and a second packed data each containing initial data elements, each of said initial data elements in said first packed data having a corresponding initial data element in said second packed data;
a circuit, coupled to said first storage area, operating in response to a signal, said circuit comprising;
a multiplication means for multiplying together said corresponding initial data elements in said first packed data and said second packed data to generate corresponding intermediate data elements, said intermediate data elements being divided into a number of sets;
an arithmetic means for generating a plurality of result data elements, a first of said plurality of result data elements representing the sum of said intermediate result data elements in a first of said number of sets, a second of said plurality of result data elements representing the sum of said intermediate result data elements in a second of said number of sets; and
a storage means for storing said result data elements as a third packed data in said memory.
-
-
23. An apparatus for use in a computer system comprising:
-
a memory having stored therein a first packed data and a second packed data, said first packed data storing a first plurality of sets of data elements, each of said first plurality of sets of data elements having a corresponding set of data elements in said second packed data; and
a processor coupled to said memory to receive said first packed data and said second packed data, said circuit storing in a third storage area a plurality of data element as a third packed data in response to receiving an instruction, each of said plurality of data elements storing the dot product of one of said first plurality of sets of data elements in said first packed data and said corresponding set of data elements in said second packed data.
-
-
24. In a computer system, a method comprising the steps of:
-
A) receiving an instruction; and
B) performing the following steps in response to receiving said instruction, B1) multiplying together a first value and a second value to generate a first intermediate result, B2) multiplying together a third value and a fourth value to generate a second intermediate result, B3) multiplying together a fifth value and a sixth value to generate a third intermediate result, B4) multiplying together a seventh value and an eighth value to generate a fourth intermediate result, B5) adding together said first intermediate result and said second intermediate result to generate a first data element in a first packed data, B6) adding together said third intermediate result and said fourth intermediate result to generate a second data element in said first packed data;
B7) storing said first packed data in a first storage area.
-
-
25. In a computer system, a method for manipulating a first packed data and a second packed data, said first packed data including A1, A2, A3, and A4 as data elements, said second packed data including B1, B2, B3, and B4 as data elements, said method comprising the steps of:
-
receiving an instruction; and
performing the following steps in response to receiving said instruction, performing the operation (A1×
B1)+(A2×
B2) to generate a first data element in a third packed data;
performing the operation (A3×
B3)+(A4×
B4) to generate a second data element in said third packed data;
storing said third packed data in a first storage area.
-
-
26. In a computer system having stored therein a first packed data and a second packed data each containing initial data elements, each of said initial data elements in said first packed data having a corresponding initial data element in said second packed data, a method for performing multiply add operations, said method comprising the steps of:
-
receiving an instruction; and
performing the following steps in response to receiving said instruction, multiplying together said corresponding initial data elements in said first packed data and said second packed data to generate corresponding intermediate data elements, said intermediate data elements being divided into a number of sets;
generating a plurality of result data elements, a first of said plurality of result data elements representing the sum of said intermediate result data elements in a first of said number of sets, a second of said plurality of result data elements representing the sum of said intermediate result data elements in a second of said number of sets; and
storing said plurality of result data elements as a third packed data in a memory.
-
Specification