DISTRIBUTED MODEL TRAINING

US 20150193695A1
Filed: 01/27/2014
Published: 07/09/2015
Est. Priority Date: 01/06/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

determining, by a device, that a machine learning model is to be trained by a plurality of devices in a network;

identifying a set of training devices from among the plurality of devices in the network to train the machine learning model, wherein each of the training devices has a local set of training data;

sending an instruction to each of the training devices, wherein the instruction is configured to cause a training device to receive model parameters from a first training device in the set, use the received model parameters with at least a portion of the local set of training data to generate new model parameters, and forward the new model parameters to a second training device in the set; and

receiving model parameters from the training devices that have been trained using a global set of training data comprising the local sets of training data on the training devices.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one embodiment, a device determines that a machine learning model is to be trained by a plurality of devices in a network. A set of training devices are identified from among the plurality of devices to train the model, with each of the training devices having a local set of training data. An instruction is then sent to each of the training devices that is configured to cause a training device to receive model parameters from a first training device in the set, use the parameters with at least a portion of the local set of training data to generate new model parameters, and forward the new model parameters to a second training device in the set. Model parameters from the training devices are also received that have been trained using a global set of training data that includes the local sets of training data on the training devices.

Citations

22 Claims

1. A method comprising:
- determining, by a device, that a machine learning model is to be trained by a plurality of devices in a network;
  
  identifying a set of training devices from among the plurality of devices in the network to train the machine learning model, wherein each of the training devices has a local set of training data;
  
  sending an instruction to each of the training devices, wherein the instruction is configured to cause a training device to receive model parameters from a first training device in the set, use the received model parameters with at least a portion of the local set of training data to generate new model parameters, and forward the new model parameters to a second training device in the set; and
  
  receiving model parameters from the training devices that have been trained using a global set of training data comprising the local sets of training data on the training devices.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method as in claim 1, wherein the machine learning model is an artificial neural network (ANN).
  - 3. The method as in claim 1, further comprising:
    - receiving a request to train the machine learning model, wherein the request comprises model parameters;
      
      using the model parameters in the request to identify the set of training devices.
  - 4. The method as in claim 3, further comprising:
    - sending the model parameters in the request to the plurality of devices in the network, wherein a device in the plurality determines whether a local data set on the device is compatible with the model parameters in the request; and
      
      receiving, from the devices in the plurality, indications as to whether or not the model parameters are compatible with the local data sets on the plurality of devices, wherein the identified set of training devices comprises devise having compatible local data sets.
  - 5. The method as in claim 1, further comprising:
    - identifying a time period of low network activity; and
      
      initiating the training of the machine learning model during the identified time period.
  - 6. The method as in claim 1, further comprising:
    - determining a random sequence of the training devices, andinstructing the training devices in the set to sequentially train the machine learning model using the local data sets on the training devices.
  - 7. The method as in claim 1, further comprising:
    - determining a sequence of the training devices based on estimated routing times between the training devices, andinstructing the training devices in the set to sequentially train the machine learning model using the local data sets on the training devices.
  - 8. The method as in claim 1, further comprising:
    - dividing the global set of training data into samples of the local sets of training data, wherein the samples correspond to a subset of the local sets of training data on the training devices;
      
      determining a sequence of the samples to be processed by the training device; and
      
      instructing the training devices to sequentially train the machine learning model using the samples.
  - 9. The method as in claim 1, further comprising:
    - dividing each local set of training data into a plurality of subsets;
      
      selecting subsets of training data from the plurality to be used to train the machine learning model;
      
      determining a sequence of the selected subsets; and
      
      instructing the training devices to sequentially train the machine learning model using the selected subsets of training data.
  - 10. The method as in claim 1, further comprising:
    - sending a collection message to the training devices to request the model parameters that have been trained using the global set of training data.

11. An apparatus, comprising:
- one or more network interfaces to communicate in a computer network;
  
  a processor coupled to the network interfaces and configured to execute one or more processes; and
  
  a memory configured to store a process executable by the processor, the process when executed operable to;
  
  determine that a machine learning model is to be trained by a plurality of devices in a network;
  
  identify a set of training devices from among the plurality of devices in the network to train the machine learning model, wherein each of the training devices has a local set of training data;
  
  send an instruction to each of the training devices, wherein the instruction is configured to cause a training device to receive model parameters from a first training device in the set, use the received model parameters with at least a is portion of the local set of training data to generate new model parameters, and forward the new model parameters to a second training device in the set; and
  
  receive model parameters from the training devices that have been trained using a global set of training data comprising the local sets of training data on the training devices.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. The apparatus as in claim 11, wherein the machine learning model is an artificial neural network (ANN).
  - 13. The apparatus as in claim 11, wherein the process when executed is further operable to:
    - determine a random sequence of the training devices, andinstruct the training devices in the set to sequentially train the machine learning model using the local data sets on the training devices.
  - 14. The apparatus as in claim 11, wherein the process when executed is further operable to:
    - determine a sequence of the training devices based on estimated routing times between the training devices, andinstruct the training devices in the set to sequentially train the machine learning model using the local data sets on the training devices.
  - 15. The apparatus as in claim 11, wherein the process when executed is further operable to:
    - divide the global set of training data into samples of the local sets of training data, wherein the samples correspond to a subset of the local sets of training data on the training devices;
      
      determine a sequence of the samples to be processed by the training device; and
      
      instruct the training devices to sequentially train the machine learning model using the samples.
  - 16. The apparatus as in claim 11, wherein the process when executed is further operable to:
    - divide each local set of training data into a plurality of subsets;
      
      select subsets of training data from the plurality to be used to train the machine learning model;
      
      determine a sequence of the selected subsets; and
      
      instruct the training devices to sequentially train the machine learning model using the selected subsets of training data.

17. A method comprising:
- receiving, from a first training device, model parameters for a machine learning model;
  
  using the received model parameters to generate new model parameters by training the machine learning model with a local set of training data; and
  
  sending the new model parameters to a second training device configured to use the sent model parameters and a second set of training data on the second training device to train the machine learning model.
- View Dependent Claims (18, 19)
- - 18. The method as in claim 17, further comprising:
    - receiving a request to train the machine learning model;
      
      determining whether the local set of training data is compatible with the machine learning model; and
      
      sending a reply that indicates whether or not the local set of training data is compatible with the machine learning model.
  - 19. The method as in claim 17, further comprising:
    - receiving an instruction to train the machine learning model, wherein the instruction identifies which portion of the local set of training data is to be used to generate the new model parameters.

20. An apparatus, comprising:
- one or more network interfaces to communicate in a computer network;
  
  a processor coupled to the network interfaces and adapted to execute one or more processes; and
  
  a memory configured to store a process executable by the processor, the process when executed operable to;
  
  receive, from a first training device, model parameters for a machine learning model;
  
  use the received model parameters to generate new model parameters by training the machine learning model with a local set of training data; and
  
  send the new model parameters to a second training device configured to use the sent model parameters and a second set of training data on the second training device to train the machine learning model.
- View Dependent Claims (21)
- - 21. The apparatus as in claim 20, wherein the machine learning model is an artificial neural network (ANN).

22. A tangible, non-transitory, computer-readable media having software encoded thereon, the software when executed by a processor operable to:
- determine that a machine learning model is to be trained by a plurality of devices in a network;
  
  identify a set of training devices from among the plurality of devices in the network to train the machine learning model, wherein each of the training devices has a local set of training data;
  
  send an instruction to each of the training devices, wherein the instruction is configured to cause a training device to receive model parameters from a first training device in the set, use the received model parameters with at least a portion of the local set of training data to generate new model parameters, and forward the new model parameters to a second training device in the set; and
  
  receive model parameters from the training devices that have been trained using a global set of training data comprising the local sets of training data on the training devices.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Original Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Inventors
Cruz Mota, Javier, Vasseur, Jean-Philippe, Di Pietro, Andrea

Granted Patent

US 9,563,854 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 21/577   Assessing vulnerabilities a...

G06N 20/00   Machine learning

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06N 3/084   Backpropagation, e.g. using...

H04L 63/1425   Traffic logging, e.g. anoma...

H04L 63/1458   Denial of Service

DISTRIBUTED MODEL TRAINING

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

DISTRIBUTED MODEL TRAINING

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links