MEMORY SIDE ACCELERATION FOR DEEP LEARNING PARAMETER UPDATES
First Claim
Patent Images
1. A computing system comprising:
- a plurality of processing nodes;
a globally addressable memory that is addressable by each of the processing nodes;
a plurality of memory side accelerators each associated with a portion of the globally addressable memory,wherein each memory side accelerator includes a scratchpad memory;
a plurality of deep learning parameters stored in the globally addressable memory,wherein each memory side accelerator is associated with a partition of the deep learning parameters;
a plurality of deep learning worker threads executing on the respective processing nodes to each calculate gradient updates based on corresponding subsets of training information,wherein each memory side accelerator is to receive a plurality of the calculated gradient updates associated with the respective partition and calculate updated deep learning parameters for the respective partition using the corresponding scratchpad memory.
1 Assignment
0 Petitions
Accused Products
Abstract
Examples disclosed herein relate to using a memory side accelerator to calculate updated deep learning parameters. A globally addressable memory includes deep learning parameters. The deep learning parameters are partitioned, where each partition is associated with a memory side accelerator. A memory side accelerator is to receive calculated gradient updates associated with its partition and calculate an update to the deep learning parameters associated with the partition.
-
Citations
19 Claims
-
1. A computing system comprising:
-
a plurality of processing nodes; a globally addressable memory that is addressable by each of the processing nodes; a plurality of memory side accelerators each associated with a portion of the globally addressable memory, wherein each memory side accelerator includes a scratchpad memory; a plurality of deep learning parameters stored in the globally addressable memory, wherein each memory side accelerator is associated with a partition of the deep learning parameters; a plurality of deep learning worker threads executing on the respective processing nodes to each calculate gradient updates based on corresponding subsets of training information, wherein each memory side accelerator is to receive a plurality of the calculated gradient updates associated with the respective partition and calculate updated deep learning parameters for the respective partition using the corresponding scratchpad memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method comprising:
-
storing a plurality of deep learning parameters in a globally addressable memory that is addressable by each of a plurality of processing nodes, wherein the globally addressable memory includes a plurality of portions, wherein each of the portions are coupled to a memory controller that includes a memory side accelerator, wherein each of the memory side accelerators is associated with a partition of the deep learning parameters, and wherein the deep learning parameters are included in a tensor of floating point numbers and each partition includes a subset of the tensor; calculating gradient updates at deep learning worker threads based on corresponding subsets of training information at respective processing nodes; receiving, at each memory side accelerator, a plurality of the calculated gradient updates associated with the respective partition; and calculating, at each memory side accelerator, an updated deep learning parameter for the respective partition. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A non-transitory machine-readable storage medium storing instructions that, if executed by a physical processing element of a first memory side accelerator of a computing system, cause the first memory side accelerator to:
-
retrieve a partition of a plurality of deep learning parameters stored in a globally addressable memory that is addressable by each of a plurality of processing nodes of the computing system, wherein the globally addressable memory includes a plurality of portions each respectively coupled to at least one of a plurality of memory side accelerators, including the first memory side accelerator; wherein each of the memory side accelerators is associated with a partition of the deep learning parameters, and wherein the deep learning parameters are included in a tensor of floating point numbers and each partition includes a subset of the tensor; receive a plurality of gradient updates associated with a first partition that corresponds to the first memory side accelerator from a plurality of deep learning worker threads; and calculate a plurality of updated deep learning parameters associated with the first partition in a scratchpad memory associated with the first memory side accelerator based on the received gradient updates. - View Dependent Claims (17, 18, 19)
-
Specification