DEEP NEURAL NETWORK PARTITIONING ON SERVERS

US 20160379108A1
Filed: 06/29/2015
Published: 12/29/2016
Est. Priority Date: 06/29/2015
Status: Active Grant

First Claim

Patent Images

1. A method for implementing a deep neural network on a server component that comprises a host component including a CPU and a hardware acceleration component coupled to the host component, the deep neural network comprising a plurality of layers, the method comprising:

partitioning the deep neural network into a first segment and a second segment, the first segment comprising a first subset of the plurality of layers, the second segment comprising a second subset of the plurality of layers;

configuring the host component to implement the first segment; and

configuring the hardware acceleration component to implement the second segment.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is provided for implementing a deep neural network on a server component that includes a host component including a CPU and a hardware acceleration component coupled to the host component. The deep neural network includes a plurality of layers. The method includes partitioning the deep neural network into a first segment and a second segment, the first segment including a first subset of the plurality of layers, the second segment including a second subset of the plurality of layers, configuring the host component to implement the first segment, and configuring the hardware acceleration component to implement the second segment.

Citations

20 Claims

1. A method for implementing a deep neural network on a server component that comprises a host component including a CPU and a hardware acceleration component coupled to the host component, the deep neural network comprising a plurality of layers, the method comprising:
- partitioning the deep neural network into a first segment and a second segment, the first segment comprising a first subset of the plurality of layers, the second segment comprising a second subset of the plurality of layers;
  
  configuring the host component to implement the first segment; and
  
  configuring the hardware acceleration component to implement the second segment.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein:
    - the plurality of layers comprises a linear layer and a convolutional layer;
      
      the first segment comprises the linear layer; and
      
      the second segment comprises the convolutional layer.
  - 3. The method of claim 1, wherein:
    - the plurality of layers comprises a linear layer and a plurality of convolutional layers;
      
      the first segment comprises the linear layer; and
      
      the second segment comprises the plurality of convolutional layers.
  - 4. The method of claim 3, wherein the plurality of layers further comprises a non-linear function and a pooling layer, and the second segment comprises the non-linear function and a pooling layer.
  - 5. The method of claim 1, wherein:
    - the plurality of layers comprises a first layer having a first memory bandwidth requirement, and a second layer having a second memory bandwidth requirement;
      
      the first segment comprises the first layer; and
      
      the second segment comprises the second layer.
  - 6. The method of claim 1, wherein the hardware acceleration component comprises one or more of a field-programmable gate array device, a massively parallel processor array device, a graphics processing unit, and an application-specific integrated circuit.
  - 7. The method of claim 1, wherein the server component comprises a data center server component.
  - 8. The method of claim 1, further comprising configuring the hardware acceleration component to implement a multi-layer convolutional neural network.
  - 9. The method of claim 1, wherein the hardware acceleration component comprises a reconfigurable array of N rows and M columns of functional units, each configured to perform a convolution of the input data and weights data.

10. A server component configured to implement a deep neural network comprising a plurality of layers, the server component comprising:
- a host component comprising a CPU;
  
  a hardware acceleration component coupled to the host component;
  
  a controller component configured to;
  
  partition the deep neural network into a first segment and a second segment, the first segment comprising a first subset of the plurality of layers, the second segment comprising a second subset of the plurality of layers;
  
  configure the host component to implement the first segment; and
  
  configure the hardware acceleration component to implement the second segment.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The server component of claim 10, wherein:
    - the plurality of layers comprises a linear layer and a convolutional layer;
      
      the first segment comprises the linear layer; and
      
      the second segment comprises the convolutional layer.
  - 12. The server component of claim 10, wherein:
    - the plurality of layers comprises a linear layer and a plurality of convolutional layers;
      
      the first segment comprises the linear layer; and
      
      the second segment comprises the plurality of convolutional layers.
  - 13. The server component of claim 12, wherein the plurality of layers further comprises a non-linear function and a pooling layer, and the second segment comprises the non-linear function and a pooling layer.
  - 14. The server component of claim 10, wherein:
    - the plurality of layers comprises a first layer having a first memory bandwidth requirement, and a second layer having a second memory bandwidth requirement;
      
      the first segment comprises the first layer; and
      
      the second segment comprises the second layer.
  - 15. The server component of claim 10, wherein the hardware acceleration component comprises one or more of a field-programmable gate array device, a massively parallel processor array device, a graphics processing unit, and an application-specific integrated circuit.
  - 16. The server component of claim 10, wherein the server component comprises a data center server component.
  - 17. The server component of claim 10, wherein the controller component configures the hardware acceleration component to implement a multi-layer convolutional neural network.
  - 18. The server component of claim 10, wherein the hardware acceleration component comprises a reconfigurable array of N rows and M columns of functional units, each configured to perform a convolution of the input data and weights data.

19. A method for implementing a deep neural network on a server component that comprises a host component including a CPU and a hardware acceleration component coupled to the host component, the deep neural network comprising a plurality of linear layers and a plurality of convolutional layers, the method comprising:
- configuring the host component to implement the linear layers; and
  
  configuring the hardware acceleration component to implement the convolutional layers.
- View Dependent Claims (20)
- - 20. The method of claim 19, wherein the hardware acceleration component comprises one or more of a field-programmable gate array device, a massively parallel processor array device, a graphics processing unit, and an application-specific integrated circuit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Ovtcharov, Kalin, Ruwase, Olatunji, Chung, Eric, Strauss, Karin, Kim, Joo-Young

Granted Patent

US 10,452,971 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06N 3/04   Architecture, e.g. intercon...

G06N 3/045   Combinations of networks

G06N 3/063   using electronic means

DEEP NEURAL NETWORK PARTITIONING ON SERVERS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

DEEP NEURAL NETWORK PARTITIONING ON SERVERS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links