ITERATION SUPPORT IN A HETEROGENEOUS DATAFLOW ENGINE
First Claim
1. A system comprising:
- at least one computing device;
an accelerator associated with the at least one computing device; and
an application programming interface (API) to expose the accelerator for iterative dataflow, wherein iterative dataflow is controlled based at least on an iteration state representing a loop count or a predicate of a computation, the API including;
a multiport configured to;
accept datablocks as input from a plurality of input channels; and
dequeue a datablock from an input channel of the plurality of input channels;
an iterator port configured to maintain the iteration state of the computation associated with the datablock that was dequeued; and
a scheduler configured to provide the datablock to the accelerator based at least on the iteration state.
3 Assignments
0 Petitions
Accused Products
Abstract
Various embodiments provide techniques and constructs to improve execution speed of distributed iterative computation using heterogeneous specialized resources including, for example, processors and accelerators. Iteration over an arbitrary sub-graph without loop unrolling including for algorithms with data-dependent loop termination and large iteration counts, including as a result of nested iteration, are supported in a resource-efficient manner without adding vertices to a dataflow graph to represent iteration constructs. Instead, some or all of the existing vertices within the sub-graph that is to be iterated upon based on having additional and/or modified ports and channels associated with them.
92 Citations
20 Claims
-
1. A system comprising:
-
at least one computing device; an accelerator associated with the at least one computing device; and an application programming interface (API) to expose the accelerator for iterative dataflow, wherein iterative dataflow is controlled based at least on an iteration state representing a loop count or a predicate of a computation, the API including; a multiport configured to; accept datablocks as input from a plurality of input channels; and dequeue a datablock from an input channel of the plurality of input channels; an iterator port configured to maintain the iteration state of the computation associated with the datablock that was dequeued; and a scheduler configured to provide the datablock to the accelerator based at least on the iteration state. - View Dependent Claims (2, 3, 4)
-
-
5. A method comprising:
-
receiving identifiers of a plurality of accelerators associated with a computing device, the plurality of accelerators being available to perform accelerator tasks; receiving a plurality of accelerator tasks at a multiport of the computing device, wherein an accelerator task has an affinity towards one or more of the accelerators and wherein the plurality of accelerator tasks include an iterative computation; determining, by the computing device, an accelerator task from the plurality of accelerator tasks that is ready for execution based at least on an iteration state; determining from the plurality of accelerators, one or more accelerators that supports the iterative computation that is ready for execution; providing, by the computing device, the accelerator task that is ready for execution to one of the determined one or more accelerators, wherein, when the accelerator task that is ready for execution includes the iterative computation, the providing includes providing the iterative computation to a determined accelerator that supports the iterative computation; and receiving, by the computing device, at least one result from the one of the one or more determined accelerators. - View Dependent Claims (6, 7, 8, 9, 10, 11)
-
-
12. One or more computer storage media having computer executable instructions recorded thereon, the computer-executable instructions, upon execution, to configure a computer for iterative dataflow with modules comprising:
-
a graph module configured to represent a plurality of accelerator tasks in a graph including a plurality of nodes and a plurality of edges, wherein a node of the plurality of nodes corresponds to an accelerator task and an edge of the plurality of edges corresponds to a channel that connects two nodes of the plurality of nodes to carry a flow of data as a plurality of datablocks between two of the plurality of accelerator tasks; a multiport module configured with connections for a plurality of channels in which the plurality of datablocks are queued for input to the multiport, and configured to dequeue an available datablock of the plurality of datablocks according to an assigned priority of the channel in which the available datablock is queued for input to the multiport; and an iterator port module configured to maintain an iteration state. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification