COMPUTATION HARDWARE WITH HIGH-BANDWIDTH MEMORY INTERFACE
First Claim
1. A computing system comprising:
- an off-chip storage device configured to store a plurality of stream elements and associated tags; and
a computation device in communication with the off-chip storage device, the computation device including;
an on-chip storage device configured to store a plurality of independently addressable resident elements; and
a plurality of parallel processing units, each parallel processing unit being configured to;
receive one or more stream elements and associated tags from the off-chip storage device;
select one or more resident elements from a subset of resident elements driven in parallel from the on-chip storage device, wherein a selected resident element is indicated by an associated tag as matching a stream element; and
perform one or more computations using the one or more stream elements and the one or more selected resident elements.
3 Assignments
0 Petitions
Accused Products
Abstract
Various embodiments relating to performing multiple computations are provided. In one embodiment, a computing system includes an off-chip storage device configured to store a plurality of stream elements and associated tags and a computation device. The computation device includes an on-chip storage device configured to store a plurality of independently addressable resident elements, and a plurality of parallel processing units. Each parallel processing unit may be configured to receive one or more stream elements and associated tags from the off-chip storage device and select one or more resident elements from a subset of resident elements driven in parallel from the on-chip storage device. A selected resident element may be indicated by an associated tag as matching a stream element. Each parallel processing unit may be configured to perform one or more computations using the one or more stream elements and the one or more selected resident elements.
-
Citations
20 Claims
-
1. A computing system comprising:
-
an off-chip storage device configured to store a plurality of stream elements and associated tags; and a computation device in communication with the off-chip storage device, the computation device including; an on-chip storage device configured to store a plurality of independently addressable resident elements; and a plurality of parallel processing units, each parallel processing unit being configured to; receive one or more stream elements and associated tags from the off-chip storage device; select one or more resident elements from a subset of resident elements driven in parallel from the on-chip storage device, wherein a selected resident element is indicated by an associated tag as matching a stream element; and perform one or more computations using the one or more stream elements and the one or more selected resident elements. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computing system comprising:
-
an off-chip storage device configured to store a plurality of stream elements and associated tags; and a computation device in communication with the off-chip storage device, the computation device including; an on-chip storage device configured to store a plurality of independently addressable resident elements; a stream manager configured to; receive a plurality of parallel data streams from the off-chip storage device; parse each of the plurality of parallel data streams into stream elements and associated tags; and send the stream elements and associated tags of each data stream to a different parallel processing unit of a plurality of parallel processing units, wherein all stream elements of a data stream are processed by a single parallel processing unit; and each parallel processing unit of the plurality of parallel processing units being configured to; receive stream elements and associated tags of a data stream from the stream manager; send requests to a priority selector for selected resident elements indicated by the associated tags, wherein the priority selector is configured to aggregate requests received from the plurality of parallel processing units to form a subset of resident elements and drive the subset of resident elements from the on-chip storage device to each of the plurality of parallel processing units; select the resident elements from the subset of resident elements driven from the on-chip storage device; and perform computations using the stream elements and the selected resident elements. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A method for performing computations with a plurality of parallel processing units of a computation device, the method comprising:
-
at each parallel processing unit, receiving one or more stream elements and associated tags from an off-chip storage device; selecting one or more resident elements from a subset of independently addressable resident elements driven in parallel from an on-chip storage device, wherein a selected resident element is indicated by an associated tag as matching a stream element; and performing one or more computations using the one or more stream elements and the one or more selected resident elements. - View Dependent Claims (18, 19, 20)
-
Specification