×

Asynchronous stochastic gradient descent

  • US 10,628,740 B2
  • Filed: 05/05/2016
  • Issued: 04/21/2020
  • Est. Priority Date: 10/02/2015
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for asynchronous stochastic gradient descent, the method comprising:

  • computing, by a generator processor on each of a plurality of learners, a gradient for a mini-batch using a current weight at each of the plurality of learners, the current weight being uniquely identified by a weight index of each of the plurality of learners, wherein the plurality of learners are arranged in a peer-to-peer arrangement without a parameter server;

    generating, by the generator processor on each of the plurality of learners, a plurality of triples, wherein each of the triples comprises the gradient, the weight index of the current weights used to compute the gradient, and a mass of the gradient, the mass equaling a number of mini-batches used to generate the gradient times a number of observations in the mini-batch;

    performing, by a reconciler processor on each of the plurality of learners, an allreduce operation on the plurality of triples to obtain an allreduced triple sequence; and

    updating, by the reconciler processor on each of the plurality of learners, the current weight at each of the plurality of learners to a new current weight using the allreduced triple sequence, wherein the new current weight becomes the current weight for a next processing batch to be computed by the generator processor.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×