TRAINING MACHINE LEARNING MODELS IN DISTRIBUTED COMPUTING SYSTEMS

US 20190318240A1
Filed: 10/08/2018
Published: 10/17/2019
Est. Priority Date: 04/16/2018
Status: Abandoned Application

First Claim

Patent Images

1. A method for training a machine learning model in a distributed computing system:

receiving a model training request;

receiving a training data set;

determining a processing node available in a distributed computing system;

receiving static status information regarding the processing node;

causing a first container to be installed at the processing node based on the static status information, the first container being configured with a model training application;

causing a second container to be installed at the processing node based on the static status information, the second container being configured with the model training application;

assigning a first layer of a model to be trained by the model training application in the first container;

assigning a second layer of the model to be trained by the model training application in the second container;

receiving parameter data from the model training application in the first container, the model training application in the second container, and the model training application in the third container; and

calculating a model parameter based on the parameter data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Certain aspects of the present disclosure provide methods and systems for training a machine learning model, such as a neural network or deep learning model, in a distributed computing system. In some embodiments, aspects of the machine learning model are trained within containers distributed amongst nodes in the distributed computing environment.

21 Citations

20 Claims

1. A method for training a machine learning model in a distributed computing system:
- receiving a model training request;
  
  receiving a training data set;
  
  determining a processing node available in a distributed computing system;
  
  receiving static status information regarding the processing node;
  
  causing a first container to be installed at the processing node based on the static status information, the first container being configured with a model training application;
  
  causing a second container to be installed at the processing node based on the static status information, the second container being configured with the model training application;
  
  assigning a first layer of a model to be trained by the model training application in the first container;
  
  assigning a second layer of the model to be trained by the model training application in the second container;
  
  receiving parameter data from the model training application in the first container, the model training application in the second container, and the model training application in the third container; and
  
  calculating a model parameter based on the parameter data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, further comprising:
    - assigning a first data subset to the model training application in the first container; and
      
      assigning a second data subset to the model training application in the second container.
  - 3. The method of claim 1, further comprising:
    - assigning a first data subset to the model training application in the first container; and
      
      assigning the first data subset to the model training application in the second container.
  - 4. The method of claim 1, further comprising:
    - causing a third container to be installed at the processing node based on the static status information, the third container being configured with the model training application;
      
      assigning the first layer and the second layer to be trained by the model training application in the third container; and
      
      receiving parameter data from the model training application in the third container.
  - 5. The method of claim 1, wherein:
    - the processing node comprises a local operating system, andthe model training application is configured to run on an operating system different from the local operating system.
  - 6. The method of claim 5, wherein the local operating is MICROSOFT WINDOWS®
    - .
  - 7. The method of claim 6, wherein the application is configured to run on LINUX.
  - 8. The method of claim 1, wherein calculating the model parameter based on the parameter data comprises applying a parameter averaging method to the parameter data.
  - 9. The method of claim 1, wherein calculating the model parameter based on the parameter data comprises applying a gradient descent method to the parameter data.

10. An apparatus for managing deployment of distributed computing resources, comprising:
- a memory comprising computer-executable instructions; and
  
  a processor in data communication with the memory and configured to execute the computer-executable instructions and cause the apparatus to perform a method for training a machine learning model in a distributed computing system, the method comprising;
  
  receiving a model training request;
  
  receiving a training data set;
  
  determining a processing node available in a distributed computing system;
  
  receiving static status information regarding the processing node;
  
  causing a first container to be installed at the processing node based on the static status information, the first container being configured with a model training application;
  
  causing a second container to be installed at the processing node based on the static status information, the second container being configured with the model training application;
  
  assigning a first layer of a model to be trained by the model training application in the first container;
  
  assigning a second layer of the model to be trained by the model training application in the second container;
  
  receiving parameter data from the model training application in the first container and the model training application in the second container; and
  
  calculating a model parameter based on the parameter data.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The apparatus of claim 10, wherein the method further comprises:
    - assigning a first data subset to the model training application in the first container; and
      
      assigning a second data subset to the model training application in the second container.
  - 12. The apparatus of claim 10, wherein the method further comprises:
    - assigning a first data subset to the model training application in the first container; and
      
      assigning the first data subset to the model training application in the second container.
  - 13. The apparatus of claim 10, wherein the method further comprises:
    - causing a third container to be installed at the processing node based on the static status information, the third container being configured with the model training application;
      
      assigning the first layer and the second layer to be trained by the model training application in the third container; and
      
      receiving parameter data from the model training application in the third container.
  - 14. The apparatus of claim 10, wherein:
    - the processing node comprises a local operating system, andthe model training application is configured to run on an operating system different from the local operating system.
  - 15. The apparatus of claim 14, wherein the local operating is MICROSOFT WINDOWS®
    - .
  - 16. The apparatus of claim 15, wherein the application is configured to run on LINUX.
  - 17. The apparatus of claim 10, wherein calculating the model parameter based on the parameter data comprises applying a parameter averaging method to the parameter data.
  - 18. The apparatus of claim 10, wherein calculating the model parameter based on the parameter data comprises applying a gradient descent method to the parameter data.

19. A non-transitory computer-readable medium comprising instructions for performing a method for training a machine learning model in a distributed computing system, the method comprising:
- receiving a model training request;
  
  receiving a training data set;
  
  determining a processing node available in a distributed computing system;
  
  receiving static status information regarding the processing node;
  
  causing a first container to be installed at the processing node based on the static status information, the first container being configured with a model training application;
  
  causing a second container to be installed at the processing node based on the static status information, the second container being configured with the model training application;
  
  assigning a first layer of a model to be trained by the model training application in the first container;
  
  assigning a second layer of the model to be trained by the model training application in the second container;
  
  receiving parameter data from the model training application in the first container, the model training application in the second container, and the model training application in the third container; and
  
  calculating a model parameter based on the parameter data.
- View Dependent Claims (20)
- - 20. The non-transitory computer-readable medium of claim 19, wherein:
    - the processing node comprises a local operating system, andthe model training application is configured to run on an operating system different from the local operating system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kazuhm, Inc.
Original Assignee
Kazuhm, Inc.
Inventors
KULKARNI, Rounak Prasad, KADIYAN, Armin, O'NEAL, Tim

Application Number

US16/154,562
Publication Number

US 20190318240A1
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/214   Generating training pattern...

G06F 2009/45562   Creating, deleting, cloning...

G06F 2009/45587   Isolation or security of vi...

G06F 8/61   Installation

G06F 8/63   Image based installation; C...

G06F 9/455   Emulation; Interpretation; ...

G06F 9/45558   Hypervisor-specific managem...

G06F 9/5044   considering hardware capabi...

G06F 9/5072   Grid computing

G06F 9/5077   Logical partitioning of res...

G06F 9/546   Message passing systems or ...

G06N 20/00   Machine learning

G06N 3/04   Architecture, e.g. intercon...

G06N 3/045   Combinations of networks

G06N 3/063   using electronic means

G06N 3/08   Learning methods

H04L 41/046   comprising network manageme...

H04L 43/0876   Network utilisation, e.g. v...

H04L 67/34   involving the movement of s...

TRAINING MACHINE LEARNING MODELS IN DISTRIBUTED COMPUTING SYSTEMS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

21 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

TRAINING MACHINE LEARNING MODELS IN DISTRIBUTED COMPUTING SYSTEMS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

21 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links