Scalable high performance 3D graphics
First Claim
1. A method of rendering graphics using a rendering computation comprising a rasterization pipeline stage, a sample fill pipeline stage and a convolution stage, said stages performed by a plurality of nodes interconnected in a ring topology, individual nodes containing a rasterization pipeline stage and/or both a sample fill pipeline stage and a convolution stage, the method comprising:
- assigning interleaves of a super-sampled buffer to respective local memories of the plurality of interconnected nodes, creating a distributed super-sampled frame buffer, said interleaves dedicated to sample fill pipeline stages and/or convolution stages in said nodes;
receiving a sequence of graphic driver commands;
determining graphic commands in accordance with the sequence of graphic driver commands;
assigning the graphic commands to ones of the plurality of interconnected nodes that contain rasterization pipeline stages, this assignment being independent of the assignment of interleaves;
sending, via the ring topology, graphics primitive loop packets specifying the graphic commands to the assigned interconnected nodes;
performing, by at least one of the interconnected nodes, a rasterization pipeline stage of the rendering computation in accordance with the graphics command assigned to the respective nodes to generate corresponding draw pixel loop packets, wherein ones of the interconnected nodes have texture information stored in the local memory, the texture information dedicated to the rasterization pipeline stage in that node;
sending, via the ring topology, by at least one of the nodes performing rasterization, draw pixel loop packets specifying a command for a sample fill pipeline stage of the rendering computation to ones of the plurality of interconnected nodes that contain sample fill pipeline stages in a manner dependent on the assignments of interleaves;
performing, by the ones of the interconnected nodes that receive a sample fill command, a sample fill pipeline stage of the rendering computation, in accordance with the respective received sample fill commands, a result being update of information in the super-sampled buffer interleaves assigned to the respective nodes;
sending, via the ring topology, video pixel loop packets specifying a convolution to ones of the plurality of interconnected nodes that contain convolution stages in a manner dependent on the assignments of interleaves;
performing, by one or more of the interconnected nodes, a convolution stage of the rendering computation, in accordance with information in the interleaves of the super-sampled frame buffer; and
wherein at least one of the interconnected nodes is configured to perform both the rasterization pipeline stage and the sample fill pipeline stage.
8 Assignments
0 Petitions
Accused Products
Abstract
A high-speed ring topology. In one embodiment, two base chip types are required: a “drawing” chip, LoopDraw, and an “interface” chip, LoopInterface. Each of these chips have a set of pins that supports an identical high speed point to point unidirectional input and output ring interconnect interface: the LoopLink. The LoopDraw chip uses additional pins to connect to several standard memories that form a high bandwidth local memory sub-system. The LoopInterface chip uses additional pins to support a high speed host computer host interface, at least one video output interface, and possibly also additional non-local interconnects to other LoopInterface chip(s).
115 Citations
86 Claims
-
1. A method of rendering graphics using a rendering computation comprising a rasterization pipeline stage, a sample fill pipeline stage and a convolution stage, said stages performed by a plurality of nodes interconnected in a ring topology, individual nodes containing a rasterization pipeline stage and/or both a sample fill pipeline stage and a convolution stage, the method comprising:
-
assigning interleaves of a super-sampled buffer to respective local memories of the plurality of interconnected nodes, creating a distributed super-sampled frame buffer, said interleaves dedicated to sample fill pipeline stages and/or convolution stages in said nodes; receiving a sequence of graphic driver commands; determining graphic commands in accordance with the sequence of graphic driver commands; assigning the graphic commands to ones of the plurality of interconnected nodes that contain rasterization pipeline stages, this assignment being independent of the assignment of interleaves; sending, via the ring topology, graphics primitive loop packets specifying the graphic commands to the assigned interconnected nodes; performing, by at least one of the interconnected nodes, a rasterization pipeline stage of the rendering computation in accordance with the graphics command assigned to the respective nodes to generate corresponding draw pixel loop packets, wherein ones of the interconnected nodes have texture information stored in the local memory, the texture information dedicated to the rasterization pipeline stage in that node; sending, via the ring topology, by at least one of the nodes performing rasterization, draw pixel loop packets specifying a command for a sample fill pipeline stage of the rendering computation to ones of the plurality of interconnected nodes that contain sample fill pipeline stages in a manner dependent on the assignments of interleaves; performing, by the ones of the interconnected nodes that receive a sample fill command, a sample fill pipeline stage of the rendering computation, in accordance with the respective received sample fill commands, a result being update of information in the super-sampled buffer interleaves assigned to the respective nodes; sending, via the ring topology, video pixel loop packets specifying a convolution to ones of the plurality of interconnected nodes that contain convolution stages in a manner dependent on the assignments of interleaves; performing, by one or more of the interconnected nodes, a convolution stage of the rendering computation, in accordance with information in the interleaves of the super-sampled frame buffer; and wherein at least one of the interconnected nodes is configured to perform both the rasterization pipeline stage and the sample fill pipeline stage. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51)
-
-
52. A 3D graphics hardware accelerator comprising a plurality of nodes connected to a ring;
-
each node comprising a loop interface for receiving packets from a neighboring node on the ring and for transmitting packets to another neighboring node on the ring;
each node further comprising a render stage and/or both a sample fill stage and a video output stage;
each of the render stage, sample fill stage and/or video output stage communicating to the ring via the loop interface;nodes that include a render stage, a sample fill stage and/or a video output stage further comprising a local memory sub-system accessible by the stage(s) via a shared memory access, the local memory sub-system storing a texture store dedicated to the render stage and/or storing an interleave of a super-sampled frame buffer dedicated to the sample fill stage and the video output stage, the interleaves for all nodes collectively creating a distributed super-sampled frame buffer; each render stage receiving graphics primitive loop packets, executing the graphics rendering specified in the graphics primitive loop packets including accessing the texture store in the local memory sub-system as required by the graphics primitive loop packet, and generating corresponding draw pixel loop packets; each sample fill stage receiving draw pixel loop packets and, as specified by the draw pixel loop packets, performing a conditional sample update function of samples and/or pixels in the interleave stored in the local memory sub-system; each video output stage receiving video pixel loop packets;
as specified by the video pixel loop packets, retrieving samples and/or pixels in the interleave stored in the local memory sub-system and using the retrieved samples and/or pixels to modify the video pixel loop packets; and
transmitting the modified video pixel loop packets;the nodes collectively containing a sufficient number of render stages, sample fill stages and video output stages connected to the ring to implement a 3D graphics rendering pipeline with those three stages. - View Dependent Claims (53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86)
-
Specification