MEMORY SIDE ACCELERATION FOR DEEP LEARNING PARAMETER UPDATES

US 20180218257A1
Filed: 01/27/2017
Published: 08/02/2018
Est. Priority Date: 01/27/2017
Status: Active Grant

First Claim

Patent Images

1. A computing system comprising:

a plurality of processing nodes;

a globally addressable memory that is addressable by each of the processing nodes;

a plurality of memory side accelerators each associated with a portion of the globally addressable memory,wherein each memory side accelerator includes a scratchpad memory;

a plurality of deep learning parameters stored in the globally addressable memory,wherein each memory side accelerator is associated with a partition of the deep learning parameters;

a plurality of deep learning worker threads executing on the respective processing nodes to each calculate gradient updates based on corresponding subsets of training information,wherein each memory side accelerator is to receive a plurality of the calculated gradient updates associated with the respective partition and calculate updated deep learning parameters for the respective partition using the corresponding scratchpad memory.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Examples disclosed herein relate to using a memory side accelerator to calculate updated deep learning parameters. A globally addressable memory includes deep learning parameters. The deep learning parameters are partitioned, where each partition is associated with a memory side accelerator. A memory side accelerator is to receive calculated gradient updates associated with its partition and calculate an update to the deep learning parameters associated with the partition.

Citations

19 Claims

1. A computing system comprising:
- a plurality of processing nodes;
  
  a globally addressable memory that is addressable by each of the processing nodes;
  
  a plurality of memory side accelerators each associated with a portion of the globally addressable memory,wherein each memory side accelerator includes a scratchpad memory;
  
  a plurality of deep learning parameters stored in the globally addressable memory,wherein each memory side accelerator is associated with a partition of the deep learning parameters;
  
  a plurality of deep learning worker threads executing on the respective processing nodes to each calculate gradient updates based on corresponding subsets of training information,wherein each memory side accelerator is to receive a plurality of the calculated gradient updates associated with the respective partition and calculate updated deep learning parameters for the respective partition using the corresponding scratchpad memory.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The computing system of claim 1, wherein each memory side accelerator is further to update a copy of the respective partition of the deep learning parameters in the scratchpad memory according to the received calculated gradient updates associated with the respective partition.
  - 3. The computing system of claim 2, wherein the respective memory side accelerator is further to update the corresponding partition of the deep learning parameters in the globally addressable memory based on the respective updated copy in the respective scratchpad memory based on a condition.
  - 4. The computing system of claim 3, wherein the condition is based on reception of calculated gradient updates associated with the partition from each of the subsets.
  - 5. The computing system of claim 3, wherein the condition includes a timer reaching a preset value after a first one of the calculated gradient updates is received by the respective memory side accelerator.
  - 6. The computing system of claim 3, further comprising a ready counter in the globally addressable memory that is incremented when each partition of the deep learning parameters in the globally addressable memory is updated.
  - 7. The computing system of claim 1, wherein the globally addressable memory is non-volatile.
  - 8. The computing system of claim 7, wherein the subset of training information for one of the deep learning worker threads is included in a volatile memory local to one of the processing nodes corresponding to the one deep learning worker thread.
  - 9. The computing system of claim 1, wherein deep learning parameters are included in a tensor of floating point numbers and each partition includes a subset of the tensor.

10. A method comprising:
- storing a plurality of deep learning parameters in a globally addressable memory that is addressable by each of a plurality of processing nodes,wherein the globally addressable memory includes a plurality of portions, wherein each of the portions are coupled to a memory controller that includes a memory side accelerator,wherein each of the memory side accelerators is associated with a partition of the deep learning parameters, andwherein the deep learning parameters are included in a tensor of floating point numbers and each partition includes a subset of the tensor;
  
  calculating gradient updates at deep learning worker threads based on corresponding subsets of training information at respective processing nodes;
  
  receiving, at each memory side accelerator, a plurality of the calculated gradient updates associated with the respective partition; and
  
  calculating, at each memory side accelerator, an updated deep learning parameter for the respective partition.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The method of claim 10, further comprising:
    - updating, by each memory side accelerator, a copy of the respective partition of the deep learning parameters in a scratchpad memory associated with the corresponding memory side accelerator, according to the received calculated gradient updates associated with the respective partition; and
      
      updating, by each memory side accelerator, the corresponding partition of the deep learning parameters in the globally addressable memory based on the respective updated copy in the respective scratchpad memory based on a condition.
  - 12. The method of claim 11, wherein the condition includes reception of calculated gradient updates associated with the partition from each of the subsets of the training information.
  - 13. The method of claim 11, wherein the condition includes a timer reaching a preset value after a first one of the calculated gradient updates is received by the respective memory side accelerator.
  - 14. The method of claim 11, further comprising:
    - incrementing a ready counter in the globally addressable memory when each partition of the deep learning parameters in the globally addressable memory is updated.
  - 15. The method of claim 10,wherein the globally addressable memory is non-volatile, andwherein the subset of training information for one of the deep learning worker threads is included in a volatile memory local to one of the processing nodes corresponding to the one deep learning worker thread.

16. A non-transitory machine-readable storage medium storing instructions that, if executed by a physical processing element of a first memory side accelerator of a computing system, cause the first memory side accelerator to:
- retrieve a partition of a plurality of deep learning parameters stored in a globally addressable memory that is addressable by each of a plurality of processing nodes of the computing system,wherein the globally addressable memory includes a plurality of portions each respectively coupled to at least one of a plurality of memory side accelerators, including the first memory side accelerator;
  
  wherein each of the memory side accelerators is associated with a partition of the deep learning parameters, andwherein the deep learning parameters are included in a tensor of floating point numbers and each partition includes a subset of the tensor;
  
  receive a plurality of gradient updates associated with a first partition that corresponds to the first memory side accelerator from a plurality of deep learning worker threads; and
  
  calculate a plurality of updated deep learning parameters associated with the first partition in a scratchpad memory associated with the first memory side accelerator based on the received gradient updates.
- View Dependent Claims (17, 18, 19)
- - 17. The non-transitory machine-readable storage medium of claim 16, further comprising instructions that, if executed by the physical processing element, cause the first memory side accelerator to:
    - calculate a plurality of updated deep learning parameters for the first partition; and
      
      update the globally addressable memory with the updated deep learning parameters for the first partition based on a condition.
  - 18. The non-transitory machine-readable storage medium of claim 17, wherein the condition includes reception of the calculated gradient updates from each of the deep learning worker threads.
  - 19. The non-transitory machine-readable storage medium of claim 17, wherein the condition includes a timer reaching a preset value after a first one of the calculated gradient updates is received by the first memory side accelerator.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hewlett Packard Enterprise Development LP (Hewlett-Packard Enterprise Company)
Original Assignee
Hewlett Packard Enterprise Development LP (Hewlett-Packard Enterprise Company)
Inventors
XU, Cong, CAI, Qiong

Granted Patent

US 10,810,492 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 9/4806   Task transfer initiation or...

G06N 3/045   Combinations of networks

G06N 3/063   using electronic means

G06N 3/08   Learning methods

MEMORY SIDE ACCELERATION FOR DEEP LEARNING PARAMETER UPDATES

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

MEMORY SIDE ACCELERATION FOR DEEP LEARNING PARAMETER UPDATES

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links