Data-optimized neural network traversal

US 10,417,555 B2
Filed: 05/06/2016
Issued: 09/17/2019
Est. Priority Date: 05/29/2015
Status: Active Grant

First Claim

Patent Images

1. A method of executing a neural network by a processor that includes a neural network engine circuit having an internal memory, comprising:

generating a first output tile of a first layer of the neural network by processing an input tile to the first layer, a size of the first output tile is based at least in part on a size of the internal memory;

storing the first output tile of the first layer in the internal memory; and

processing, by the neural network engine circuit, and using the first output tile, a plurality of adjacent layers of the neural network, wherein the plurality of adjacent layers is grouped and partitioned into one or more frustums and a rectangular intersection of the frustums determines at least in part the size of the first output tile;

generating, using the neural network engine circuit, a second output tile of a second output layer of the frustum of the neural network by processing the first output tile of the first layer stored in the internal memory.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Executing a neural network includes generating an output tile of a first layer of the neural network by processing an input tile to the first layer and storing the output tile of the first layer in an internal memory of a processor. An output tile of a second layer of the neural network can be generated using the processor by processing the output tile of the first layer stored in the internal memory.

Citations

20 Claims

1. A method of executing a neural network by a processor that includes a neural network engine circuit having an internal memory, comprising:
- generating a first output tile of a first layer of the neural network by processing an input tile to the first layer, a size of the first output tile is based at least in part on a size of the internal memory;
  
  storing the first output tile of the first layer in the internal memory; and
  
  processing, by the neural network engine circuit, and using the first output tile, a plurality of adjacent layers of the neural network, wherein the plurality of adjacent layers is grouped and partitioned into one or more frustums and a rectangular intersection of the frustums determines at least in part the size of the first output tile;
  
  generating, using the neural network engine circuit, a second output tile of a second output layer of the frustum of the neural network by processing the first output tile of the first layer stored in the internal memory.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein each tile comprises of a portion of each feature map of a plurality feature maps such that the tile have a three-dimensional profile that includes a height, a width and a number of feature maps.
  - 3. The method of claim 1, wherein the neural network is partitioned into a plurality of frustums, wherein each frustum is processed independently.
  - 4. The method of claim 3, wherein the processor comprises a plurality of compute units that are configured to process the plurality of frustums in parallel.
  - 5. The method of claim 1, wherein the first layer and the second layer are feature extraction layers configured to process a plurality of images to generate a plurality of output feature maps, the method further comprising:
    - processing the plurality of output feature maps for the plurality of images through a feature classification layer of the neural network in batch.
  - 6. The method of claim 5, wherein the processing the plurality of output feature maps of the plurality of images through the feature classification layer comprises:
    - loading a first plurality of weights of the feature classification layer from an external memory into the internal memory of the processor; and
      
      processing each of the plurality of output feature maps using the first plurality of weights of the feature classification layer prior to loading, from the external memory, a second plurality of weights of the feature classification layer or weights of a next feature classification layer.
  - 7. The method of claim 6, further comprising:
    - responsive to the processing of each of the plurality of output feature maps using the first plurality of weights of the feature classification layer, loading the second plurality of weights of the feature classification layer into the internal memory;
      
      wherein the second plurality of weights for the feature classification layer overwrite the first plurality of weights for the feature classification layer.

8. An apparatus comprising a processor configured to execute a neural network, the processor comprising:
- an internal memory;
  
  a first compute unit coupled to the internal memory and configured to perform executable operations including;
  
  generating a first output tile of a first layer of the neural network by processing an input tile to the first layer, wherein a size of the first output tile is determined at least in part by a size of the internal memory;
  
  storing the first output tile of the first layer in an internal memory of a processor; and
  
  processing, by the processor and using the first output tile, a plurality of adjacent layers of the neural network, wherein the plurality of adjacent layers is grouped and partitioned into one or more frustums and wherein a rectangular intersection of the frustums determines at least in part the size of the first output tile;
  
  generating, using the processor, a second output tile of a second output layer of the frustum of the neural network by processing the first output tile of the first layer stored in the internal memory.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The apparatus of claim 8, wherein each tile comprises of a portion of each feature map of a plurality feature maps such that the tile have a three-dimensional profile that includes a height, a width and a number of feature maps.
  - 10. The apparatus of claim 8, wherein the neural network is partitioned into a plurality of frustums, wherein each frustum is processed independently.
  - 11. The apparatus of claim 10, wherein the processor comprises a plurality of compute units that are configured to process the plurality of frustums in parallel.
  - 12. The apparatus of claim 8, wherein the first layer and the second layer are feature extraction layers configured to process a plurality of images to generate a plurality of output feature maps, wherein the first compute unit is configured to initiate executable operations further comprising:
    - processing the plurality of output feature maps for the plurality of images through a feature classification layer of the neural network in batch.
  - 13. The apparatus of claim 12, further comprising:
    - an external memory coupled to the first compute unit;
      
      wherein the processing the plurality of output feature maps for the plurality of images through the feature classification layer comprises loading a first plurality of weights of the feature classification layer from the external memory into the internal memory and processing each of the plurality of output feature maps using the first plurality of weights of the feature classification layer prior to loading, from the external memory, a second plurality of weights for the feature classification layer or weights of a next feature classification layer.
  - 14. The apparatus of claim 13, wherein the first compute unit is programmed to initiate executable operations further comprising:
    - responsive to the processing of each of the plurality of output feature maps using the first plurality of weights of the feature classification layer, loading the second plurality of weights of the feature classification layer into the internal memory;
      
      wherein the second plurality of weights for the feature classification layer overwrite the first plurality of weights for the feature classification layer.

15. A computer program product comprising a computer readable storage medium having program code stored thereon for executing a neural network, the program code executable by a processor to perform operations comprising:
- generating a first output tile of a first layer of the neural network by processing an input tile to the first layer, wherein a size of the first output tile is determined at least in part by a size of the internal memory;
  
  storing the first output tile of the first layer in an internal memory of a processor; and
  
  processing, by the processor and using the first output tile, a plurality of adjacent layers of the neural network, wherein the plurality of adjacent layers is grouped and partitioned into at least one frustum and a rectangular intersection of the frustums determines at least in part the size of the first output tile;
  
  generating, using the processor, a second output tile of a second output layer of the frustum of the neural network by processing the first output tile of the first layer stored in the internal memory.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer program product of claim 15, wherein each tile comprises of a portion of each feature map of a plurality feature maps such that the tile have a three-dimensional profile that includes a height, a width and a number of feature maps.
  - 17. The computer program product of claim 15, wherein the neural network is partitioned into a plurality of frustums, wherein each frustum is processed independently.
  - 18. The computer program product of claim 15, wherein the first layer and the second layer are feature extraction layers configured to process a plurality of images to generate a plurality of output feature maps, wherein the program code is executable by the processor to perform operations further comprising:
    - processing the plurality of output feature maps for the plurality of images through a feature classification layer of the neural network in batch.
  - 19. The computer program product of claim 18, wherein the processing the plurality of output feature maps of the plurality of images through the feature classification layer comprises:
    - loading a first plurality of weights of the feature classification layer from an external memory into the internal memory of the processor; and
      
      processing each of the plurality of output feature maps using the first plurality of weights of the feature classification layer prior to loading, from the external memory, a second plurality of weights of the feature classification layer or weights of a next feature classification layer.
  - 20. The computer program product of claim 19, wherein the program code is executable by the processor to perform operations further comprising:
    - responsive to the processing of each of the plurality of output feature maps using the first plurality of weights of the feature classification layer, loading the second plurality of weights of the feature classification layer into the internal memory;
      
      wherein the second plurality of weights for the feature classification layer overwrite the first plurality of weights for the feature classification layer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Samsung Electronics Co. Ltd.
Inventors
Brothers, John W., Lee, Joohoon
Primary Examiner(s)
Gonzales, Vincent
Assistant Examiner(s)
Raker, Seth Andrew

Application Number

US15/148,627
Publication Number

US 20160350645A1
Time in Patent Office

1,229 Days
Field of Search
US Class Current
CPC Class Codes

G06N 3/04   Architecture, e.g. intercon...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06N 3/10   Interfaces, programming lan...

G06T 1/60   Memory management

Data-optimized neural network traversal

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Data-optimized neural network traversal

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links