MACHINE LEARNING INFERENCE ENGINE SCALABILITY
First Claim
1. A system comprising:
- a plurality of computing cores; and
a control unit;
wherein the control unit is configured to select a first computing core of the plurality of computing cores to use in fetching first data on behalf of one or more second computing cores of the plurality of computing cores;
wherein each of the one or more second computing cores of the plurality of computing cores is configured to;
receive the first data fetched and broadcast by the first computing core;
fetch second data different from second data fetched by other computing cores of the plurality of computing cores; and
perform one or more computations using the first data and the second data in order to perform a computing operation.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme which minimizes the memory bandwidth utilization of the multi-core inference accelerator engine. In one implementation, this mapping scheme involves having one inference core of the multi-core inference accelerator engine fetch given data and broadcast the given data to other inference cores of the inference accelerator engine. Each inference core fetches second data unique to the respective inference core. The inference cores then perform computations on the first and second data in order to implement the machine learning model.
-
Citations
20 Claims
-
1. A system comprising:
-
a plurality of computing cores; and a control unit; wherein the control unit is configured to select a first computing core of the plurality of computing cores to use in fetching first data on behalf of one or more second computing cores of the plurality of computing cores; wherein each of the one or more second computing cores of the plurality of computing cores is configured to; receive the first data fetched and broadcast by the first computing core; fetch second data different from second data fetched by other computing cores of the plurality of computing cores; and perform one or more computations using the first data and the second data in order to perform a computing operation. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
selecting, by a control unit, a first computing core of a plurality of computing cores to use in fetching first data on behalf of one or more second computing cores of the plurality of computing cores; each of the one or more second computing cores of the plurality of computing cores; receiving the first data fetched and broadcast by the first computing core; fetching second data different from second data fetched by other computing cores of the plurality of computing cores; and performing one or more computations using the first data and the second data in order to perform a computing operation. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. An apparatus comprising:
-
a plurality of computing cores; a control unit; and a table for mapping machine learning model types to memory bandwidth reduction schemes; wherein the control unit is configured to access the table to determine a memory bandwidth reduction scheme to use in performing an indicated computing operation, wherein the scheme identifies one or more first computing cores of the plurality of computing cores to use in fetching first data and broadcasting the first data to one or more second computing cores of the plurality of computing cores; wherein each second computing core of the plurality of computing cores is configured to; receive the first data fetched and broadcast by a corresponding first computing core; fetch second data for use by the computing core, wherein the second data fetched by the computing core is different from second data fetched by other computing cores of the plurality of computing cores; and perform one or more computations using the first data and the second data in order to perform the computing operation. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification