MACHINE LEARNING INFERENCE ENGINE SCALABILITY

US 20190325305A1
Filed: 08/30/2018
Published: 10/24/2019
Est. Priority Date: 04/20/2018
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a plurality of computing cores; and

a control unit;

wherein the control unit is configured to select a first computing core of the plurality of computing cores to use in fetching first data on behalf of one or more second computing cores of the plurality of computing cores;

wherein each of the one or more second computing cores of the plurality of computing cores is configured to;

receive the first data fetched and broadcast by the first computing core;

fetch second data different from second data fetched by other computing cores of the plurality of computing cores; and

perform one or more computations using the first data and the second data in order to perform a computing operation.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme which minimizes the memory bandwidth utilization of the multi-core inference accelerator engine. In one implementation, this mapping scheme involves having one inference core of the multi-core inference accelerator engine fetch given data and broadcast the given data to other inference cores of the inference accelerator engine. Each inference core fetches second data unique to the respective inference core. The inference cores then perform computations on the first and second data in order to implement the machine learning model.

Citations

20 Claims

1. A system comprising:
- a plurality of computing cores; and
  
  a control unit;
  
  wherein the control unit is configured to select a first computing core of the plurality of computing cores to use in fetching first data on behalf of one or more second computing cores of the plurality of computing cores;
  
  wherein each of the one or more second computing cores of the plurality of computing cores is configured to;
  
  receive the first data fetched and broadcast by the first computing core;
  
  fetch second data different from second data fetched by other computing cores of the plurality of computing cores; and
  
  perform one or more computations using the first data and the second data in order to perform a computing operation.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system as recited in claim 1, wherein the second data comprises a set of coefficients associated with a corresponding set of filters.
  - 3. The system as recited in claim 1, wherein the first data comprises input channel data.
  - 4. The system as recited in claim 1, wherein the first computing core is selected based on an identified memory bandwidth reduction scheme.
  - 5. The system as recited in claim 4, wherein the control unit is further configured to receive an indication of a computing operation to be performed by the plurality of computing cores, wherein the indication specifies a type of machine learning model to implement.
  - 6. The system as recited in claim 5, wherein the control unit is further configured to determine which portions of the machine learning model to map to the plurality of computing cores based on a size of an input dataset.
  - 7. The system as recited in claim 5, wherein the control unit is further configured to determine which portions of the machine learning model to map to the plurality of computing cores based on a number and size of filters of the machine learning model.

8. A method comprising:
- selecting, by a control unit, a first computing core of a plurality of computing cores to use in fetching first data on behalf of one or more second computing cores of the plurality of computing cores;
  
  each of the one or more second computing cores of the plurality of computing cores;
  
  receiving the first data fetched and broadcast by the first computing core;
  
  fetching second data different from second data fetched by other computing cores of the plurality of computing cores; and
  
  performing one or more computations using the first data and the second data in order to perform a computing operation.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method as recited in claim 8, wherein the second data comprises a set of coefficients associated with a corresponding set of filters.
  - 10. The method as recited in claim 8, wherein the first data comprises input channel data.
  - 11. The method as recited in claim 8, wherein the first computing core is selected based on an identified memory bandwidth reduction scheme.
  - 12. The method as recited in claim 11, further comprising receiving an indication of a computing operation to be performed by the plurality of computing cores, wherein the indication specifies a type of machine learning model to implement.
  - 13. The method as recited in claim 12, further comprising determining which portions of the machine learning model to map to the plurality of computing cores based on a size of an input dataset.
  - 14. The method as recited in claim 12, further comprising determining which portions of the machine learning model to map to the plurality of computing cores based on a number and size of filters of the machine learning model.

15. An apparatus comprising:
- a plurality of computing cores;
  
  a control unit; and
  
  a table for mapping machine learning model types to memory bandwidth reduction schemes;
  
  wherein the control unit is configured to access the table to determine a memory bandwidth reduction scheme to use in performing an indicated computing operation, wherein the scheme identifies one or more first computing cores of the plurality of computing cores to use in fetching first data and broadcasting the first data to one or more second computing cores of the plurality of computing cores;
  
  wherein each second computing core of the plurality of computing cores is configured to;
  
  receive the first data fetched and broadcast by a corresponding first computing core;
  
  fetch second data for use by the computing core, wherein the second data fetched by the computing core is different from second data fetched by other computing cores of the plurality of computing cores; and
  
  perform one or more computations using the first data and the second data in order to perform the computing operation.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The apparatus as recited in claim 15, wherein the second data comprises a set of coefficients associated with a corresponding set of filters.
  - 17. The apparatus as recited in claim 15, wherein the first data comprises input channel data.
  - 18. The apparatus as recited in claim 15, wherein the one or more computations are performed as part of a machine learning model.
  - 19. The apparatus as recited in claim 18, wherein the indication of the computing operation specifies a type of machine learning model to implement.
  - 20. The apparatus as recited in claim 19, wherein the control unit is further configured to determine which portions of the machine learning model to map to the plurality of computing cores based on a size of an input dataset.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Advanced Micro Devices, Inc., ATI Technologies ULC (Advanced Micro Devices, Inc.)
Original Assignee
Advanced Micro Devices, Inc., ATI Technologies ULC (Advanced Micro Devices, Inc.)
Inventors
Zhang, Lei, Lagudu, Sateesh, Rush, Allen

Granted Patent

US 11,948,073 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06N 3/04   Architecture, e.g. intercon...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/063   using electronic means

G06N 3/08   Learning methods

MACHINE LEARNING INFERENCE ENGINE SCALABILITY

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

MACHINE LEARNING INFERENCE ENGINE SCALABILITY

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links