×

Deep neural network partitioning on servers

  • US 10,452,971 B2
  • Filed: 06/29/2015
  • Issued: 10/22/2019
  • Est. Priority Date: 06/29/2015
  • Status: Active Grant
First Claim
Patent Images

1. A system comprising:

  • multiple server units, each server unit comprising;

    a plurality of central processing units;

    a hardware acceleration processing unit coupled to a top-of-rack switch, wherein the hardware acceleration processing unit performs processing on packets received from, or sent to, the top-of-rack switch without burdening operations performed by one of the plurality of central processing units;

    a local link communicationally coupling the central processing unit to the hardware acceleration processing unit;

    a first network interface communicationally coupled to at least one of the plurality of central processing units; and

    a second network interface different from, and independent of, the first network interface, the second network interface communicationally coupled to the hardware acceleration processing unit independently of one of the plurality of central processing units, such that a second hardware acceleration processing unit, of a second server unit of the multiple server units, communicates directly with the hardware acceleration processing unit through the second network interface, to the exclusion of communicating with one of the plurality of central processing units;

    wherein a first set of the hardware acceleration processing units are head components that calculate feature values that will be used as input for subsequent processing to be performed by a second set of the hardware acceleration processing units;

    wherein the second set of the hardware acceleration processing units are free form expression executing processors that receive the feature values from the head components and perform the subsequent processing; and

    wherein one or more of the multiple server units execute computer-executable instructions which, when executed, cause the one or more of the multiple server units to provide a service mapping component that performs steps comprising;

    assigning, to multiple head components, convolution processing associated with one or more convolution layers of a deep neural network (DNN), the multiple head components then executing the assigned convolution processing utilizing one or more free form expression executing processors to perform at least a portion of the convolution processing; and

    assigning, to one or more central processing units of one or more of the multiple server units that comprise the multiple head components to which the convolution processing was assigned, linear processing of output of the convolution processing, the one or more central processing units then executing the assigned linear processing;

    wherein the service mapping component associates a first central processing unit of a first server unit with a second hardware acceleration component of a second server unit, differing from the first server unit, in response to a failure of a first hardware acceleration component of the first server unit.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×