METHOD AND SYSTEM FOR DISTRIBUTED MACHINE LEARNING

US 20130290223A1
Filed: 04/27/2012
Published: 10/31/2013
Est. Priority Date: 04/27/2012
Status: Active Grant

First Claim

Patent Images

1. A method, implemented on at least one machine each of which has at least one processor, storage, and a communication platform connected to a network for distributed machine learning on a cluster including a plurality of nodes, the method comprising the steps of:

performing a machine learning process in each of the plurality of nodes based on a respective subset of training data to calculate a local parameter, wherein the training data is partitioned over the plurality of nodes;

determining a plurality of operation nodes from the plurality of nodes based on a status of the machine learning process performed in each of the plurality of nodes;

connecting the plurality of operation nodes to form a network topology; and

generating an aggregated parameter by merging local parameters calculated in each of the plurality of operation nodes in accordance with the network topology.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Method, system, and programs for distributed machine learning on a cluster including a plurality of nodes are disclosed. A machine learning process is performed in each of the plurality of nodes based on a respective subset of training data to calculate a local parameter. The training data is partitioned over the plurality of nodes. A plurality of operation nodes are determined from the plurality of nodes based on a status of the machine learning process performed in each of the plurality of nodes. The plurality of operation nodes are connected to form a network topology. An aggregated parameter is generated by merging local parameters calculated in each of the plurality of operation nodes in accordance with the network topology.

Citations

24 Claims

1. A method, implemented on at least one machine each of which has at least one processor, storage, and a communication platform connected to a network for distributed machine learning on a cluster including a plurality of nodes, the method comprising the steps of:
- performing a machine learning process in each of the plurality of nodes based on a respective subset of training data to calculate a local parameter, wherein the training data is partitioned over the plurality of nodes;
  
  determining a plurality of operation nodes from the plurality of nodes based on a status of the machine learning process performed in each of the plurality of nodes;
  
  connecting the plurality of operation nodes to form a network topology; and
  
  generating an aggregated parameter by merging local parameters calculated in each of the plurality of operation nodes in accordance with the network topology.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the step of generating an aggregated parameter comprises:
    - calculating an initial aggregated parameter by merging initial local parameters calculated in each of the plurality of operation nodes;
      
      transmitting the initial aggregated parameter to each of the plurality of operation nodes in accordance with the network topology; and
      
      calculating an updated aggregated parameter by merging updated local parameters calculated in each of the plurality of operation nodes, each updated local parameter being calculated based on the initial aggregated parameter and the subset of the training data in each of the plurality of operation nodes.
  - 3. The method of claim 2, whereina stochastic gradient descent process is performed in each of the plurality of operation nodes for calculating the initial local parameter;
    - anda batch gradient descent process is performed in each of the plurality of operation nodes for calculating the updated local parameter.
  - 4. The method of claim 1, wherein a same subset of the training data is allocated to a plurality of competing nodes prior to performing a machine learning process.
  - 5. The method of claim 4, wherein the step of determining a plurality of operation nodes comprises determining an operation node from the plurality of competing nodes with the same subset of the training data based on a processing speed of each of the plurality of competing nodes.
  - 6. The method of claim 1, wherein the step of determining a plurality of operation nodes further comprises:
    - dynamically detecting a slow operation node based on a processing speed of each of the plurality of operation nodes;
      
      moving the subset of the training data and the local parameter of the slow operation node to a backup node of the cluster; and
      
      replacing the slow operation node with the backup node in the network topology.

7. A system for distributed machine learning, the system comprising:
- a plurality of nodes, each node is configured to perform a machine learning process based on a respective subset of training data to calculate a local parameter, wherein the training data is partitioned over the plurality of nodes; and
  
  a coordination node operatively coupled to the plurality of operation nodes, configured to;
  
  determine a plurality of operation nodes from the plurality of nodes based on a status of the machine learning process performed in each of the plurality of nodes, andconnect the plurality of operation nodes to form a network topology,wherein the plurality of operation nodes are configured to generate an aggregated parameter by merging local parameters calculated in each of the plurality of operation nodes in accordance with the network topology.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein the plurality of operation nodes are configured to:
    - calculate an initial aggregated parameter by merging initial local parameters calculated in each of the plurality of operation nodes;
      
      transmit the initial aggregated parameter to each of the plurality of operation nodes in accordance with the network topology; and
      
      calculate an updated aggregated parameter by merging updated local parameters calculated in each of the plurality of operation nodes, each updated local parameter being calculated based on the initial aggregated parameter and the subset of the training data in each of the plurality of operation nodes.
  - 9. The system of claim 8, whereina stochastic gradient descent process is performed in each of the plurality of operation nodes for calculating the initial local parameter;
    - anda batch gradient descent process is performed in each of the plurality of operation nodes for calculating the updated local parameter.
  - 10. The system of claim 7, a same subset of the training data is allocated to a plurality of competing nodes prior to performing a machine learning process.
  - 11. The system of claim 10, wherein the coordination node is further configured to determine an operation node from the plurality of competing nodes with the same subset of the training data based on a processing speed of each of the plurality of competing nodes.
  - 12. The system of claim 7, wherein the coordination node is further configured to:
    - dynamically detect a slow operation node based on a processing speed of each of the plurality of operation nodes;
      
      move the subset of the training data and the local parameter of the slow operation node to a backup node of the cluster; and
      
      replace the slow operation node with the backup node in the network topology.

13. A machine-readable tangible and non-transitory medium having information for distributed machine learning on a cluster including a plurality of nodes recorded thereon, wherein the information, when read by the machine, causes the machine to perform the following:
- partitioning training data over the plurality of nodes such that each of the plurality of nodes stores a subset of the training data, wherein a machine learning process is performed in each of the plurality of nodes based on a respective subset of the training data to calculate a local parameter;
  
  performing a machine learning process in each of the plurality of nodes based on a respective subset of training data to calculate a local parameter, wherein the training data is partitioned over the plurality of nodes;
  
  determining a plurality of operation nodes from the plurality of nodes based on a status of the machine learning process performed in each of the plurality of nodes;
  
  connecting the plurality of operation nodes to form a network topology; and
  
  generating an aggregated parameter by merging local parameters calculated in each of the plurality of operation nodes in accordance with the network topology.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The medium of claim 13, wherein the step of generating an aggregated parameter comprises:
    - calculating an initial aggregated parameter by merging initial local parameters calculated in each of the plurality of operation nodes;
      
      transmitting the initial aggregated parameter to each of the plurality of operation nodes in accordance with the network topology; and
      
      calculating an updated aggregated parameter by merging updated local parameters calculated in each of the plurality of operation nodes, each updated local parameter being calculated based on the initial aggregated parameter and the subset of the training data in each of the plurality of operation nodes.
  - 15. The medium of claim 14, whereina stochastic gradient descent process is performed in each of the plurality of operation nodes for calculating the initial local parameter;
    - anda batch gradient descent process is performed in each of the plurality of operation nodes for calculating the updated local parameter.
  - 16. The medium of claim 13, wherein a same subset of the training data is allocated to a plurality of competing nodes prior to performing a machine learning process.
  - 17. The medium of claim 16, wherein the step of determining a plurality of operation nodes comprises determining an operation node from the plurality of competing nodes with the same subset of the training data based on a processing speed of each of the plurality of competing nodes.
  - 18. The medium of claim 13, wherein the step of determining a plurality of operation nodes comprisesdynamically detecting a slow operation node based on a processing speed of each of the plurality of operation nodes;
    - moving the subset of the training data and the local parameter of the slow operation node to a backup node of the cluster; and
      
      replacing the slow operation node with the backup node in the network topology.

19. A method, implemented on at least one machine each of which has at least one processor, storage, and a communication platform connected to a network for distributed machine learning on a cluster including a plurality of nodes, the method comprising the steps of:
- storing a subset of training data that is partitioned over the plurality of nodes;
  
  performing a stochastic gradient descent process based on the subset of the training data to calculate an initial local parameter;
  
  transmitting the initial local parameter to at least one connected node in accordance with a network topology;
  
  receiving an initial aggregated parameter from the at least one connected node, wherein the initial aggregated parameter is calculated by merging initial local parameters calculated by each of the plurality of nodes in accordance with the network topology;
  
  performing a batch gradient descent process based on the received initial aggregated parameter and the subset of the training data to calculate an updated local parameter; and
  
  transmitting the updated local parameter to the at least one connected node in accordance with the network topology for calculating an updated aggregated parameter.
- View Dependent Claims (20)
- - 20. The method of claim 19, further comprising:
    - merging the initial local parameter with one or more initial local parameters received from one or more connected nodes in accordance with the network topology; and
      
      merging the updated local parameter with one or more updated local parameters received from the one or more connected nodes in accordance with the network topology.

21. An apparatus comprising:
- a storage configured to store a subset of training data that is partitioned over the plurality of nodes;
  
  an AllReducing module configured to;
  
  transmit a local parameter to at least one connected node in accordance with a network topology, andreceive an aggregated parameter from the at least one connected node, wherein an initial aggregated parameter is calculated by merging initial local parameters calculated by each of the plurality of nodes in accordance with the network topology; and
  
  a machine learning module configured to;
  
  perform a stochastic gradient descent process based on the subset of the training data to calculate the initial local parameter, andperform a batch gradient descent process based on the initial aggregated parameter and the subset of the training data to calculate an updated local parameter, wherein the updated local parameter is transmitted to the at least one connected node for calculating an updated aggregated parameter.
- View Dependent Claims (22)
- - 22. The apparatus of claim 21, wherein the AllReducing module is further configured to:
    - merge the initial local parameter with one or more initial local parameters received from one or more connected nodes in accordance with the network topology; and
      
      merge the updated local parameter with one or more updated local parameters received from the one or more connected nodes in accordance with the network topology.

23. A machine-readable tangible and non-transitory medium having information for distributed machine learning on a cluster including a plurality of nodes recorded thereon, wherein the information, when read by the machine, causes the machine to perform the following:
- storing a subset of training data that is partitioned over the plurality of nodes;
  
  performing a stochastic gradient descent process based on the subset of the training data to calculate an initial local parameter;
  
  transmitting the initial local parameter to at least one connected node in accordance with a network topology;
  
  receiving an initial aggregated parameter from the at least one connected node, wherein the initial aggregated parameter is calculated by merging initial local parameters calculated by each of the plurality of nodes in accordance with the network topology;
  
  performing a batch gradient descent process based on the received initial aggregated parameter and the subset of the training data to calculate an updated local parameter; and
  
  transmitting the updated local parameter to the at least one connected node in accordance with the network topology for calculating an updated aggregated parameter.
- View Dependent Claims (24)
- - 24. The medium of claim 23, further comprising:
    - merging the initial local parameter with one or more initial local parameters received from one or more connected nodes in accordance with the network topology; and
      
      merging the updated local parameter with one or more updated local parameters received from the one or more connected nodes in accordance with the network topology.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
R2 Solutions LLC (Acacia Research Corporation)
Original Assignee
Yahoo! Inc. (Apollo Global Management, Inc.)
Inventors
Chapelle, Olivier, Langford, John, Dudik, Miroslav, Agarwal, Alekh

Granted Patent

US 9,633,315 B2
Time in Patent Office

Days
Field of Search
US Class Current

706/12
CPC Class Codes

G06N 20/00 Machine learning

METHOD AND SYSTEM FOR DISTRIBUTED MACHINE LEARNING

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND SYSTEM FOR DISTRIBUTED MACHINE LEARNING

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links