Method and apparatus for hardware-accelerated machine learning

US 10,846,624 B2
Filed: 02/19/2020
Issued: 11/24/2020
Est. Priority Date: 12/22/2016
Status: Active Grant

First Claim

Patent Images

1. A machine-learning apparatus comprising:

a feature extractor for a convolutional neural network, wherein the feature extractor is deployed on a member of the group consisting of (1) a reconfigurable logic device, (2) a graphics processing unit (GPU), and (3) a chip multi-processor (CMP), wherein the member comprises a plurality of data processing engines arranged as a multi-functional pipeline through which data is streamed, the pipelined data processing engines configured for operation in parallel with each other;

each pipelined data processing engine being configured to (1) receive streaming data and perform a processing operation on the received streaming data, and (2) be responsive to a control instruction that defines whether that pipelined data processing engine is an activated data processing engine or a deactivated data processing engine, wherein an activated data processing engine is configured to perform its processing operation on streaming data received thereby, and wherein a deactivated data processing engine remains in the pipeline but does not perform its processing operation on streaming data received thereby, the multi-functional pipeline thereby being configured to provide a plurality of different pipeline functions in response to control instructions that are configured to selectively activate and deactivate the pipelined data processing engines, each pipeline function being the combined functionality of each activated pipelined data processing engine in the pipeline at a given time;

wherein each of a plurality of the data processing engines is configured as a convolution engine that convolves first data with second data via correlation logic;

wherein each of another plurality of the data processing engines is configured as a data reduction engine that performs a data reduction operation on data received thereby; and

wherein the multi-functional pipeline is configured to activate a plurality of the convolution engines and a plurality of the data reduction engines at the same time in response to control instructions in order to configure the multi-functional pipeline as the feature extractor for the convolutional neural network.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A multi-functional data processing pipeline for use with machine learning is disclosed. The multi-functional pipeline may comprise a plurality of pipelined data processing engines, the plurality of pipelined data processing engines being configured to perform processing operations, and the pipelined data processing engines can include correlation logic. The multi-functional pipeline can be configured to controllably activate or deactivate each of the pipelined data processing engines in the pipeline in response to control instructions and thereby define a function for the pipeline, each pipeline function being the combined functionality of each activated pipelined data processing engine in the pipeline. In example embodiments, such pipelines can be used to accelerate convolutional layers in machine-learning technology such as convolutional neural networks.

636 Citations

50 Claims

1. A machine-learning apparatus comprising:
- a feature extractor for a convolutional neural network, wherein the feature extractor is deployed on a member of the group consisting of (1) a reconfigurable logic device, (2) a graphics processing unit (GPU), and (3) a chip multi-processor (CMP), wherein the member comprises a plurality of data processing engines arranged as a multi-functional pipeline through which data is streamed, the pipelined data processing engines configured for operation in parallel with each other;
  
  each pipelined data processing engine being configured to (1) receive streaming data and perform a processing operation on the received streaming data, and (2) be responsive to a control instruction that defines whether that pipelined data processing engine is an activated data processing engine or a deactivated data processing engine, wherein an activated data processing engine is configured to perform its processing operation on streaming data received thereby, and wherein a deactivated data processing engine remains in the pipeline but does not perform its processing operation on streaming data received thereby, the multi-functional pipeline thereby being configured to provide a plurality of different pipeline functions in response to control instructions that are configured to selectively activate and deactivate the pipelined data processing engines, each pipeline function being the combined functionality of each activated pipelined data processing engine in the pipeline at a given time;
  
  wherein each of a plurality of the data processing engines is configured as a convolution engine that convolves first data with second data via correlation logic;
  
  wherein each of another plurality of the data processing engines is configured as a data reduction engine that performs a data reduction operation on data received thereby; and
  
  wherein the multi-functional pipeline is configured to activate a plurality of the convolution engines and a plurality of the data reduction engines at the same time in response to control instructions in order to configure the multi-functional pipeline as the feature extractor for the convolutional neural network.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 2. The apparatus of claim 1 wherein the multi-functional pipeline comprises a plurality of pairs of the convolution engines and the data reduction engines arranged within the multi-functional pipeline in an interleaved order.
  - 3. The apparatus of claim 1 wherein the multi-functional pipeline is configured to deactivate at least one of the convolution engines or data reduction engines while a plurality of the convolution engines and a plurality of the data reduction engines are activated.
  - 4. The apparatus of claim 3 wherein the multi-functional pipeline is further configured to selectively activate and deactivate different mixes of the convolution engines and data reduction engines based on whether the multi-functional pipeline is to operate in a training mode or a classification mode.
  - 5. The apparatus of claim 3 wherein the multi-functional pipeline is further configured to disconnect power from a deactivated convolution engine or data reduction engine while retaining power to the activated convolution engines and data reduction engines.
  - 6. The apparatus of claim 1 wherein the correlation logic of each convolution engine is configured to operate on second data over a sliding window of first data.
  - 7. The apparatus of claim 6 wherein a first convolutional engine in the multi-functional pipeline is configured to process pixel data from an image as the first data and a plurality of weights as the second data.
  - 8. The apparatus of claim 6 wherein each convolution engine includes (1) a data shift register through which first data is streamed and (2) a register that holds second data, and wherein the correlation logic of each convolution engine comprises a plurality of multipliers and summation logic, wherein the multipliers are configured to multiply values in a plurality of cells of the data shift register and the register, and wherein the summation logic is connected to a plurality of outputs of the multipliers to sum the outputs from the multipliers.
  - 9. The apparatus of claim 1 wherein at least one of the data reduction engines is configured to perform a max pooling operation.
  - 10. The apparatus of claim 1 wherein at least one of the data reduction engines is configured to perform an averaging operation.
  - 11. The apparatus of claim 1 wherein at least one of the data reduction engines is configured to perform a sampling operation.
  - 12. The apparatus of claim 1 wherein at least one of the data reduction engines comprises at least two of (1) max pooling logic, (2) averaging logic, and (3) sampling logic, and wherein the at least one data reduction engine is configured to select which of the at least two is used to process data received thereby in response to a data reduction control instruction.
  - 13. The apparatus of claim 1 wherein the activated data processing engines further comprise at least one of (1) an encryption engine, (2) a decryption engine, (3) a compression engine, (4) a decompression engine, and (5) a search engine.
  - 14. The apparatus of claim 1 wherein the multi-functional pipeline further comprises a plurality of parallel paths, each parallel path comprising at least one data processing engine.
  - 15. The apparatus of claim 1 wherein the member comprises the reconfigurable logic device, wherein at least a portion of the multi-functional pipeline resides on the reconfigurable logic device.
  - 16. The apparatus of claim 15 wherein the member further comprises at least one of a GPU and a CMP, and wherein another portion of the multifunctional pipeline resides on the at least one GPU or CMP.
  - 17. The apparatus of claim 15 wherein the re-configurable logic device comprises a field programmable gate array (FPGA), wherein at least a portion of the multi-functional pipeline resides on the FPGA.
  - 18. The apparatus of claim 1 wherein the member comprises the GPU, wherein at least a portion of the multi-functional pipeline resides on the GPU.
  - 19. The apparatus of claim 18 wherein the member further comprises at least one of a reconfigurable logic device and a CMP, and wherein another portion of the multifunctional pipeline resides on the at least one reconfigurable logic device or CMP.
  - 20. The apparatus of claim 1 wherein the member comprises the CMP, wherein at least a portion of the multi-functional pipeline resides on the CMP.

21. A machine-learning method comprising:
- selectively activating a plurality of data processing engines in a multi-functional pipeline in response to a control instruction to define a feature extractor for a convolutional neural network, the multi-functional pipeline being resident on a member of the group consisting of (1) a reconfigurable logic device, (2) a graphics processing unit (GPU), and (3) a chip multi-processor (CMP);
  
  wherein the multi-functional pipeline comprises a plurality of data processing engines through which data is streamed, the pipelined data processing engines configured for operation in parallel with each other, each pipelined data processing engine being configured to (1) receive streaming data and perform a processing operation on the received streaming data, and (2) be responsive to the control instruction that defines whether that pipelined data processing engine is an activated data processing engine or a deactivated data processing engine;
  
  wherein an activated data processing engine is configured to perform its processing operation on streaming data received thereby;
  
  wherein a deactivated data processing engine remains in the pipeline but does not perform its processing operation on streaming data received thereby, the multi-functional pipeline thereby being configured to provide a plurality of different pipeline functions in response to the control instructions that are configured to selectively activate and deactivate the pipelined data processing engines, each pipeline function being the combined functionality of each activated pipelined data processing engine in the pipeline at a given time;
  
  wherein each of a plurality of the data processing engines is configured as a convolution engine that convolves first data with second data via correlation logic;
  
  wherein each of another plurality of the data processing engines is configured as a data reduction engine that performs a data reduction operation on data received thereby; and
  
  wherein a plurality of the selectively activated data processing engines comprise a plurality of the convolution engines and a plurality of the data reduction engines;
  
  streaming data into a first activated convolution engine in the multi-functional pipeline, the streaming data comprising (1) input data to be classified via the convolutional neural network as the first data and (2) weight data as the second data; and
  
  the activated pipelined data processing engines in the multi-functional pipeline performing their data processing operations on data received thereby to perform feature extraction on the input data as part of the convolutional neural network.
- View Dependent Claims (22, 23, 24, 25, 26, 27)
- - 22. The method of claim 21 further comprising:
    - selectively deactivating at least one of the convolution engines or data reduction engines while a plurality of the convolution engines and a plurality of the data reduction engines are activated.
  - 23. The method of claim 22 wherein the selectively activating and deactivating steps comprise selectively activating and deactivating different mixes of the convolution engines and data reduction engines based on whether the multi-functional pipeline is to operate in a training mode or a classification mode.
  - 24. The method of claim 22 wherein the selectively deactivating step comprises disconnecting power from a deactivated convolution engine or data reduction engine while retaining power to the activated convolution engines and data reduction engines.
  - 25. The method of claim 21 wherein at least one of the activated data reduction engines performs a max pooling operation, an averaging operation, and/or a sampling operation.
  - 26. The method of claim 21 wherein the activated data processing engines further comprise at least one of (1) an encryption engine, (2) a decryption engine, (3) a compression engine, (4) a decompression engine, and (5) a search engine.
  - 27. The method of claim 21 wherein the multi-functional pipeline further comprises a plurality of parallel paths, each parallel path comprising at least one data processing engine, and wherein the selectively activating step comprises activating a plurality of data processing engines in different parallel paths so that those data processing engines operate in parallel.

28. A machine-learning apparatus comprising:
- a member of the group consisting of (1) a reconfigurable logic device, (2) a graphics processing unit (GPU), and (3) a chip multi-processor (CMP), wherein the member comprises a plurality of data processing engines arranged as a multi-functional pipeline through which data is streamed, the pipelined data processing engines configured for operation in parallel with each other;
  
  each pipelined data processing engine being configured to (1) receive streaming data and perform a processing operation on the received streaming data, and (2) be responsive to a control instruction that defines whether that pipelined data processing engine is an activated data processing engine or a deactivated data processing engine, wherein an activated data processing engine is configured to perform its processing operation on streaming data received thereby, and wherein a deactivated data processing engine remains in the pipeline but does not perform its processing operation on streaming data received thereby, the multi-functional pipeline thereby being configured to provide a plurality of different pipeline functions in response to control instructions that are configured to selectively activate and deactivate the pipelined data processing engines, each pipeline function being the combined functionality of each activated pipelined data processing engine in the pipeline at a given time;
  
  wherein at least one of the data processing engines comprises a convolution engine that serves as a convolutional layer for a convolutional neural network to support machine-learning operations; and
  
  wherein the multi-functional pipeline is configured to selectively activate a plurality of the data processing engines, including the convolution engine, at the same time in response to control instructions in order to configure the multi-functional pipeline to support machine-learning operations.
- View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50)
- - 29. The apparatus of claim 28 wherein each of a plurality of the data processing engines comprises correlation logic, and wherein at least one of the data processing engines that comprises the correlation logic is configured as the convolution engine.
  - 30. The apparatus of claim 28 wherein the multi-functional pipeline is further configured to disconnect power from a deactivated data processing engine while retaining power to the activated data processing engines.
  - 31. The apparatus of claim 28 wherein the convolution engine comprises correlation logic configured to operate on second data over a sliding window of first data.
  - 32. The apparatus of claim 31 wherein the convolutional engine is configured to process pixel data from an image as the first data and a plurality of weights as the second data.
  - 33. The apparatus of claim 31 wherein the convolution engine includes (1) a data shift register through which first data is streamed and (2) a register that holds second data, and wherein the correlation logic comprises a plurality of multipliers and summation logic, wherein the multipliers are configured to multiply values in a plurality of cells of the data shift register and the register, and wherein the summation logic is connected to a plurality of outputs of the multipliers to sum the outputs from the multipliers.
  - 34. The apparatus of claim 28 wherein at least one of the data processing engines is configured to perform a max pooling operation.
  - 35. The apparatus of claim 28 wherein at least one of the data processing engines is configured to perform an averaging operation.
  - 36. The apparatus of claim 28 wherein at least one of the data processing engines is configured to perform a sampling operation.
  - 37. The apparatus of claim 28 wherein at least one of the data processing engines comprises a data reduction engine, wherein the data reduction engine comprises (1) max pooling logic and (2) averaging logic, and wherein the data reduction engine is configured to select which of the max pooling logic and the averaging logic is used to process data received by the data reduction engine in response to a data reduction control instruction.
  - 38. The apparatus of claim 28 wherein at least one of the data processing engines comprises a data reduction engine, wherein the data reduction engine comprises (1) max pooling logic and (2) sampling logic, and wherein the data reduction engine is configured to select which of the max pooling logic and the sampling logic is used to process data received by the data reduction engine in response to a data reduction control instruction.
  - 39. The apparatus of claim 28 wherein at least one of the data processing engines comprises a data reduction engine, wherein the data reduction engine comprises (1) averaging logic and (2) sampling logic, and wherein the data reduction engine is configured to select which of the averaging logic and the sampling logic is used to process data received by the data reduction engine in response to a data reduction control instruction.
  - 40. The apparatus of claim 28 wherein at least one of the data processing engines comprises a data reduction engine, wherein the data reduction engine comprises (1) max pooling logic, (2) averaging logic, and (3) sampling logic, and wherein the data reduction engine is configured to select which of the max pooling logic, the averaging logic, and the sampling logic is used to process data received by the data reduction engine in response to a data reduction control instruction.
  - 41. The apparatus of claim 28 wherein at least one of the data processing engines comprises a search engine.
  - 42. The apparatus of claim 28 wherein at least one of the data processing engines comprises a decryption engine.
  - 43. The apparatus of claim 28 wherein the multi-functional pipeline further comprises a plurality of parallel paths, each parallel path comprising at least one data processing engine.
  - 44. The apparatus of claim 28 wherein the member comprises the reconfigurable logic device, wherein at least a portion of the multi-functional pipeline resides on the reconfigurable logic device.
  - 45. The apparatus of claim 44 wherein the member further comprises the GPU, and wherein another portion of the multifunctional pipeline resides on the GPU.
  - 46. The apparatus of claim 44 wherein the member further comprises the CMP, and wherein another portion of the multifunctional pipeline resides on the CMP.
  - 47. The apparatus of claim 44 wherein the re-configurable logic device comprises a field programmable gate array (FPGA), wherein at least a portion of the multi-functional pipeline resides on the FPGA.
  - 48. The apparatus of claim 28 wherein the member comprises the GPU, wherein at least a portion of the multi-functional pipeline resides on the GPU.
  - 49. The apparatus of claim 48 wherein the member further comprises the CMP, and wherein another portion of the multifunctional pipeline resides on the CMP.
  - 50. The apparatus of claim 28 wherein the member comprises the CMP, wherein at least a portion of the multi-functional pipeline resides on the CMP.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ip Reservoir, LLC
Original Assignee
Ip Reservoir, LLC
Inventors
Chamberlain, Roger D., Indeck, Ronald S.
Primary Examiner(s)
Ly, Anh

Application Number

US16/795,016
Publication Number

US 20200184378A1
Time in Patent Office

279 Days
Field of Search
US Class Current
CPC Class Codes

G06F 15/7867   with reconfigurable archite...

G06F 16/2455   Query execution

G06F 17/00   Digital computing or data p...

G06F 21/602   Providing cryptographic fac...

G06F 21/72   in cryptographic circuits

G06F 21/76   in application-specific int...

G06F 21/85   interconnection devices, e....

G06F 3/0601   Interfaces specially adapte...

G06F 3/061   Improving I/O performance

G06F 3/0655   Vertical data movement, i.e...

G06F 3/067   Distributed or networked st...

G06F 3/0673   Single storage device

G06F 3/0683   Plurality of storage devices

G06F 9/44505   Configuring for program ini...

G06F 9/5061   Partitioning or combining o...

G06N 20/00   Machine learning

G06N 3/045   Combinations of networks

G06N 3/063   using electronic means

G06N 3/08   Learning methods

H04L 9/14   using a plurality of keys o...

Method and apparatus for hardware-accelerated machine learning

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

636 Citations

50 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for hardware-accelerated machine learning

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

636 Citations

50 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links