Processor having an arithmetic extension of an instruction set architecture
First Claim
Patent Images
1. In a processor, a method for performing computer graphics calculations, said method comprising:
- representing a vertex in a computer graphics image with a plurality of world coordinates;
transforming said plurality of world coordinates into a plurality of transformed world coordinates using a floating point reduction add instruction; and
performing perspective division on said plurality of transformed world coordinates using a floating point reciprocal instruction, wherein said floating point reciprocal instruction processes a plurality of operands to generate a plurality of reduced precision reciprocal values.
5 Assignments
0 Petitions
Accused Products
Abstract
A processor having an arithmetic extension of an instruction set architecture which incorporates a set of high performance floating point operations. The instruction set architecture incorporates a variety of data formats including single precision and double precision data formats, as well as the paired-single data format that allows two simultaneous operations on a pair of operands. The extension includes instructions directed to reduction add, reduction multiply, reciprocal, and reciprocal square root.
113 Citations
61 Claims
-
1. In a processor, a method for performing computer graphics calculations, said method comprising:
-
representing a vertex in a computer graphics image with a plurality of world coordinates;
transforming said plurality of world coordinates into a plurality of transformed world coordinates using a floating point reduction add instruction; and
performing perspective division on said plurality of transformed world coordinates using a floating point reciprocal instruction, wherein said floating point reciprocal instruction processes a plurality of operands to generate a plurality of reduced precision reciprocal values. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
reading a first value from a first position of a first data set and a second value from a second position of said first data set;
adding said first value and said second value, and placing the result in a first position of a destination data set;
reading a third value from a first position of a second data set and a fourth value from a second position of said second data set; and
adding said third value and said fourth value, and placing the result in a second position of said destination data set.
-
-
3. The method for performing computer graphics calculations as recited in claim 2 wherein said floating point reduction add instruction is an ADDR instruction.
-
4. The method for performing computer graphics calculations as recited in claim 1 wherein said floating point reciprocal instruction comprises:
calculating said plurality of reduced precision reciprocal values with values from a plurality of lookup tables configured in parallel.
-
5. The method for performing computer graphics calculations as recited in claim 4 wherein said floating point reciprocal instruction is a RECIP1 instruction.
-
6. The method for performing computer graphics calculations as recited in claim 4 further comprising:
calculating a plurality of full precision reciprocal values using said plurality of reduced precision reciprocal values.
-
7. The method for performing computer graphics calculations as recited in claim 6 wherein said plurality of full precision reciprocal values is determined using a Newton-Raphson approximation.
-
8. The method for performing computer graphics calculations as recited in claim 1 wherein said plurality of world coordinates and said plurality of transformed world coordinates are in a paired-single data format.
-
9. The method for performing computer graphics calculations as recited in claim 1 further comprising a floating point reduction multiply instruction comprising:
-
reading a first value from a first position of a first data set and a second value from a second position of said first data set;
multiplying said first value and said second value, and placing the result in a first position of a destination data set;
reading a third value from a first position of a second data set and a fourth value from a second position of said second data set; and
multiplying said third value and said fourth value, and placing the result in a second position of said destination data set.
-
-
10. The method for performing computer graphics calculations as recited in claim 9 wherein said floating point reduction multiply instruction is a MULR instruction.
-
11. The method for performing computer graphics calculations as recited in claim 1 wherein said floating point reciprocal instruction comprises:
calculating said plurality of reduced precision reciprocal values with values from at least one lookup table. MULR instruction.
-
12. In a processor, a method for performing computer graphics calculations, said method comprising:
-
representing a vertex in a computer graphics image with a plurality of surface normal coordinates;
transforming said plurality of surface normal coordinates into a plurality of transformed surface normal coordinates using a floating point reduction add instruction; and
normalizing said plurality of transformed surface normal coordinates using a floating point reciprocal square root instruction, wherein said reciprocal square root instruction processes a plurality of operands to generate a plurality of reduced precision reciprocal square root values. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
computing a dot product between said plurality of transformed surface normal coordinates and a vector using said floating point reduction add instruction;
computing a halfway vector in a lighting calculation; and
normalizing said halfway vector using said floating point reciprocal square root instruction.
-
-
14. The method for performing computer graphics calculations as recited in claim 12 wherein said floating point reduction add instruction comprises:
-
reading a first value from a first position of a first data set and a second value from a second position of said first data set;
adding said first value and said second value, and placing the result in a first position of a destination data set;
reading a third value from a first position of a second data set and a fourth value from a second position of said second data set; and
adding said third value and said fourth value, and placing the result in a second position of said destination data set.
-
-
15. The method for performing computer graphics calculations as recited in claim 14 wherein said floating point reduction add instruction is an ADDR instruction.
-
16. The method for performing computer graphics calculations as recited in claim 12 wherein said floating point reciprocal square root instruction comprises:
calculating said plurality of reduced precision reciprocal square root values with values from a plurality of lookup tables configured in parallel.
-
17. The method for performing computer graphics calculations as recited in claim 16 wherein said floating point reciprocal square root instruction is a RSQRT1 instruction.
-
18. The method for performing computer graphics calculations as recited in claim 16 further comprising:
calculating a plurality of full precision reciprocal square root values using said plurality of reduced precision reciprocal square root values.
-
19. The method for performing computer graphics calculations as recited in claim 18 wherein said plurality of full precision reciprocal square root values is determined using a Newton-Raphson approximation.
-
20. The method for performing computer graphics calculations as recited in claim 12 wherein said plurality of surface normal coordinates and said plurality of transformed surface normal coordinates are in a paired-single data format.
-
21. The method for performing computer graphics calculations as recited in claim 12 further comprising a floating point reduction multiply instruction comprising:
-
reading a first value from a first position of a first data set and a second value from a second position of said first data set;
multiplying said first value and said second value, and placing the result in a first position of a destination data set;
reading a third value from a first position of a second data set and a fourth value from a second position of said second data set; and
multiplying said third value and said fourth value, and placing the result in a second position of a destination data set.
-
-
22. The method for performing computer graphics calculations as recited in claim 21 wherein said floating point reduction multiply instruction is a MULR instruction.
-
23. The method for performing computer graphics calculations as recited in claim 12 further comprising:
computing a dot product between said plurality of transformed surface normal coordinates and a vector using said floating point reduction add instruction.
-
24. The method for performing computer graphics calculations as recited in claim 12 wherein said floating point reciprocal square root instruction comprises:
calculating said plurality of reduced precision reciprocal square root values with values from at least one lookup table.
-
25. A processor for computer graphics calculations, said processor comprising:
-
a bus;
an instruction dispatch unit coupled to said bus, said instruction dispatch unit for dispatching instructions to a floating point unit; and
said floating point unit coupled to said bus, said floating point unit for executing said instructions to implement a method for performing computer graphics calculations, said method comprising;
representing a vertex in a computer graphics image with a plurality of world coordinates;
transforming said plurality of world coordinates into a plurality of transformed world coordinates using a floating point reduction add instruction; and
performing perspective division on said plurality of transformed world coordinates using a floating point reciprocal instruction, wherein said floating point reciprocal instruction processes a plurality of operands to generate a plurality of reduced precision reciprocal values. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33)
calculating said plurality of reduced precision reciprocal values with values from a plurality of lookup tables configured in parallel.
-
-
27. The processor of claim 26 wherein said method for performing computer graphics calculations further comprises:
calculating a plurality of full precision reciprocal values using said plurality of reduced precision reciprocal values.
-
28. The processor of claim 27 wherein said plurality of full precision reciprocal values is determined using a Newton-Raphson approximation.
-
29. The processor of claim 26 wherein said floating point reciprocal instruction is a RECIP1 instruction.
-
30. The processor of claim 25 wherein said floating point reduction add instruction is an ADDR instruction.
-
31. The processor of claim 25 wherein said method for performing computer graphics calculations comprises:
calculating said plurality of reduced precision reciprocal values with values from at least one lookup table.
-
32. The processor of claim 28 wherein said plurality of world coordinates and said plurality of transformed world coordinates are in a paired-single data format.
-
33. The processor of claim 25 wherein said method for performing computer graphics calculations further comprises a floating point reduction multiply instruction, wherein said multiply instruction is a MULR instruction.
-
34. A processor for computer graphics calculations, said processor comprising:
-
a bus;
an instruction dispatch unit coupled to said bus, said instruction dispatch unit for dispatching instructions to a floating point unit; and
said floating point unit coupled to said bus, said floating point unit for executing said instructions to implement a method for performing computer graphics calculations, said method comprising;
representing a vertex in a computer graphics image with a plurality of surface normal coordinates;
transforming said plurality of surface normal coordinates into a plurality of transformed surface normal coordinates using a floating point reduction add instruction;
normalizing said plurality of transformed surface normal coordinates using a floating point reciprocal square root instruction, wherein said reciprocal square root instruction processes a plurality of operands to generate a plurality of reduced precision reciprocal square root values;
computing a dot product between said plurality of transformed surface normal coordinates and a vector using said floating point reduction add instruction;
computing a halfway vector in a lighting calculation; and
normalizing said halfway vector using said floating point reciprocal square root instruction. - View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42)
calculating said plurality of reduced precision reciprocal square root values with values from a plurality of lookup tables configured in parallel.
-
-
36. The processor of claim 35 wherein said method for performing computer graphics calculations further comprises:
calculating a plurality of full precision reciprocal square root values using said plurality of reduced precision reciprocal square root values.
-
37. The processor of claim 36 wherein said plurality of full precision reciprocal square root values is determined using a Newton-Raphson approximation.
-
38. The processor of claim 35 wherein said floating point reciprocal square root instruction is a RSQRT1 instruction.
-
39. The processor of claim 34 wherein said floating point reduction add instruction is an ADDR instruction.
-
40. The processor of claim 34 wherein said method for performing computer graphics calculations comprises:
calculating said plurality of reduced precision reciprocal square root values with values from at least one lookup table.
-
41. The processor of claim 34 wherein said plurality of surface normal coordinates and said plurality of transformed surface normal coordinates are in a paired-single data format.
-
42. The processor of claim 34 wherein said method for performing computer graphics calculations further comprises a floating point reduction multiply instruction, wherein said multiply instruction is a MULR instruction.
-
43. In a processor including a memory and an execution unit (“
- EU”
), a method for determining a plurality of reduced precision values from a plurality of operands comprising;storing a first instruction in said memory, said first instruction being formatted to operate on said plurality of operands in parallel;
dispatching said first instruction for execution by said EU; and
executing said first instruction in said EU, wherein said executing includes;
accessing in parallel a plurality of lookup tables in said EU to obtain a plurality of first intermediate results, wherein each lookup table is accessed with a first portion of a corresponding operand;
modifying in parallel a second portion of each of said plurality of operands to obtain a plurality of second intermediate results; and
arithmetically combining in parallel said plurality of first intermediate results with said plurality of second intermediate results to obtain said plurality of reduced precision values. - View Dependent Claims (44, 45, 46)
- EU”
-
47. A system comprising:
-
a processor;
a memory, coupled to said processor, for storing a first plurality of instructions that enables said processor to perform select operations, wherein said first plurality of instructions includes;
a first instruction that enables said processor to combine a first plurality of operands in accordance with a first method, said first method comprising;
reading a first operand from a first position of a first data set and a second operand from a second position of said first data set;
combining said first operand and said second operand of said first data set, and placing the result in a first position of a first destination data set;
reading a third operand from a first position of a second data set and a fourth operand from a second position of said second data set; and
combining said third operand and said fourth operand of said second data set, and placing the result in a second position of said first destination data set; and
a second instruction that enables said processor to determine a plurality of reduced precision values from a second plurality of operands. - View Dependent Claims (48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59)
reading a first operand from a first position of a third data set and a second operand from a second position of said third data set;
combining said first operand and said second operand of said third data set, and placing the result in a first position of a second destination data set;
reading a third operand from a first position of a fourth data set and a fourth operand from a second position of said fourth data set; and
combining said third operand and said fourth operand of said fourth data set, and placing the result in a second position of said second destination data set.
-
-
53. The system of claim 52 wherein said combining of said first and second data sets is a function being selected from the group consisting essentially of add, subtract, multiply and divide, and said combining of said third and fourth data sets is a function being selected from the group consisting essentially of add, subtract, multiply and divide.
-
54. The system of claim 47 wherein said reduced precision values are reciprocal values.
-
55. The system of claim 54 wherein said reciprocal values are combined with a plurality of three-dimensional world coordinates representing at least one vertex of a primitive in a graphics system to project said coordinates into a two-dimensional space.
-
56. The system of claim 55 wherein said second instruction is a RECIP1 instruction.
-
57. The system of claim 56 wherein said reciprocal square root values are combined with a plurality of surface normal coordinates representing at least one vertex of a primitive in a graphics system to renormalize said coordinates.
-
58. The system of claim 57 wherein said second instruction is a RSQRT1 instruction.
-
59. The system of claim 47 wherein said reduced precision values are reciprocal square root values.
-
60. A system comprising:
-
a memory for holding a plurality of instructions, said plurality of instructions including a first, second and third instruction; and
a processor, coupled to said memory, for executing said plurality of instructions, wherein said processor includes;
a first means for executing said first instruction to produce a plurality of reduced precision reciprocal values;
a second means for executing said second instruction to produce a plurality of reduced precision reciprocal square root values; and
a third means for executing said third instruction for performing a floating point reduction computation, wherein said computation is a function selected from the group consisting essentially of add and multiply.
-
-
61. A computer program product comprising a computer readable medium having a plurality of instructions stored thereon, the plurality of instructions for enabling a processor to perform certain operations, wherein the plurality of instructions include:
-
a first instruction that enables said processor to combine a first plurality of operands in accordance with a method, said method comprising;
reading a first operand from a first position of a first data set and a second operand from a second position of said first data set;
combining said first operand and said second operand, and placing the result in a first position of a destination data set;
reading a third operand from a first position of a second data set and a fourth operand from a second position of said second data set; and
combining said third operand and said fourth operand, and placing the result in a second position of said destination data set; and
a second instruction that enables said processor to determine a plurality of reduced precision values from a second plurality of operands.
-
Specification