Systems and methods to perform machine learning with feedback consistency

US 10,482,379 B2
Filed: 07/29/2016
Issued: 11/19/2019
Est. Priority Date: 07/29/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method to perform machine learning, the method comprising:

obtaining, by one or more computing devices, data descriptive of an encoder model that is configured to receive a first set of inputs and, in response to receipt of the first set of inputs, output a first set of outputs;

obtaining, by the one or more computing devices, data descriptive of a decoder model that is configured to receive the first set of outputs and, in response to receipt of the first set of outputs, output a second set of outputs;

determining, by the one or more computing devices, a loss function that describes a difference between the first set of inputs and the second set of outputs;

backpropagating, by the one or more computing devices, the loss function through the decoder model without modifying the decoder model; and

after backpropagating, by the one or more computing devices, the loss function through the decoder model, continuing to backpropagate, by the one or more computing devices, the loss function through the encoder model to train the encoder model;

wherein continuing to backpropagate, by the one or more computing devices, the loss function through the encoder model to train the encoder model comprises adjusting, by the one or more computing devices, at least one weight included in the encoder model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present disclosure provides systems and methods that enable training of an encoder model based on a decoder model that performs an inverse transformation relative to the encoder model. In one example, an encoder model can receive a first set of inputs and output a first set of outputs. The encoder model can be a neural network. The decoder model can receive the first set of outputs and output a second set of outputs. A loss function can describe a difference between the first set of inputs and the second set of outputs. According to an aspect of the present disclosure, the loss function can be sequentially backpropagated through the decoder model without modifying the decoder model and then through the encoder model while modifying the encoder model, thereby training the encoder model. Thus, an encoder model can be trained to have enforced consistency relative to the inverse decoder model.

Citations

20 Claims

1. A computer-implemented method to perform machine learning, the method comprising:
- obtaining, by one or more computing devices, data descriptive of an encoder model that is configured to receive a first set of inputs and, in response to receipt of the first set of inputs, output a first set of outputs;
  
  obtaining, by the one or more computing devices, data descriptive of a decoder model that is configured to receive the first set of outputs and, in response to receipt of the first set of outputs, output a second set of outputs;
  
  determining, by the one or more computing devices, a loss function that describes a difference between the first set of inputs and the second set of outputs;
  
  backpropagating, by the one or more computing devices, the loss function through the decoder model without modifying the decoder model; and
  
  after backpropagating, by the one or more computing devices, the loss function through the decoder model, continuing to backpropagate, by the one or more computing devices, the loss function through the encoder model to train the encoder model;
  
  wherein continuing to backpropagate, by the one or more computing devices, the loss function through the encoder model to train the encoder model comprises adjusting, by the one or more computing devices, at least one weight included in the encoder model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The computer-implemented method of claim 1, wherein:
    - the encoder model is configured to;
      
      receive the first set of inputs expressed according to a first set of dimensions; and
      
      output the first set of outputs expressed according to a second set of dimensions that are different from the first set of dimensions; and
      
      the decoder model is configured to output the second set of outputs expressed according to the first set of dimensions.
  - 3. The computer-implemented method of claim 1, wherein at least the encoder model comprises a neural network.
  - 4. The computer-implemented method of claim 1, wherein the encoder model comprises a sensor fusion model that is configured to:
    - receive a set of sensor data as the first set of inputs, the set of sensor data reported by a plurality of sensors; and
      
      in response to receipt of the set of sensor data, output a set of condition data as the first set of outputs, the set of condition data descriptive of a condition evidenced by the set of sensor data.
  - 5. The computer-implemented method of claim 4, wherein the decoder model comprises a sensor data prediction model that is configured to:
    - receive the set of condition data; and
      
      in response to receipt of the set of condition data, predict a second set of sensor data, the second set of sensor data comprising sensor readings expected to result from the condition described by the set of condition data.
  - 6. The computer-implemented method of claim 1, wherein the encoder model comprises a sensor fusion model that is configured to:
    - receive a set of sensor data as the first set of inputs, the set of sensor data reported by a plurality of sensors of a mobile device; and
      
      in response to receipt of the set of sensor data, output a set of pose data as the first set of outputs, the set of pose data descriptive of a pose of the mobile device that includes the plurality of sensors.
  - 7. The computer-implemented method of claim 6, wherein the set of pose data comprises a set of six degree of freedom pose data that describes the pose of the mobile device in six degrees of freedom.
  - 8. The computer-implemented method of claim 6, wherein the decoder model comprises a sensor prediction model that is configured to:
    - receive the set of pose data; and
      
      in response to receipt of the set of pose data, predict a second set of sensor data, the second set of sensor data comprising sensor readings expected to result from the pose of the mobile device.
  - 9. The computer-implemented method of claim 8, further comprising:
    - determining, by the one or more computing devices, the pose of the mobile device based at least in part on the set of pose data output by the sensor fusion model.
  - 10. The computer-implemented method of claim 1, wherein the second set of outputs comprises an idealized version of the first set of inputs in which noise has been removed.
  - 11. The computer-implemented method of claim 1, wherein the second set of outputs comprises a set of hypothetical inputs that correspond to the first set of outputs.

12. A computing system to perform machine learning, the computing system comprising:
- at least one processor; and
  
  at least one tangible, non-transitory computer-readable medium that stores instructions that, when executed by the at least one processor, cause the computing system to;
  
  obtain data descriptive of a model that comprises an encoder model and a decoder model, wherein the encoder model is configured to receive a first set of inputs and, in response to receipt of the first set of inputs, output a first set of outputs, and wherein the decoder model is configured to receive the first set of outputs and, in response to receipt of the first set of outputs, output a second set of outputs;
  
  determine a loss function that describes a difference between the first set of inputs and the second set of outputs;
  
  backpropagate the loss function through the decoder model without modifying the decoder model; and
  
  after backpropagating the loss function through the decoder model, continue to backpropagate the loss function through the encoder model while modifying the encoder model to train the encoder model.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The computing system of claim 12, wherein:
    - the encoder model comprises a sensor fusion model that is configured to;
      
      receive a set of sensor data as the first set of inputs, the set of sensor data reported by a plurality of sensors of a mobile device; and
      
      in response to receipt of the set of sensor data, output a set of pose data as the first set of outputs, the set of pose data descriptive of a pose of the mobile device that includes the plurality of sensors; and
      
      the decoder model comprises a sensor prediction model that is configured to;
      
      receive the set of pose data; and
      
      in response to receipt of the set of pose data, predict a second set of sensor data, the second set of sensor data comprising sensor readings expected to result from the pose of the mobile device.
  - 14. The computing system of claim 13, wherein:
    - the computing system consists of the mobile device;
      
      the mobile device comprises the plurality of sensors, the at least one processor, and the at least one tangible, non-transitory computer-readable medium that stores the instructions; and
      
      the at least one tangible, non-transitory computer-readable medium stores the model.
  - 15. The computing system of claim 14, wherein execution of the instructions by the at least one processor further causes the mobile device to, after continuing to backpropagate the loss function through the sensor fusion model to train the sensor fusion model:
    - receive a third set of sensor data newly reported by the plurality of sensors;
      
      input the third set of sensor data into the sensor fusion model;
      
      receive a second set of pose data as an output of the sensor fusion model; and
      
      determine a current pose of the mobile device based at least in part on the second set of pose data.
  - 16. The computing system of claim 12, wherein:
    - the encoder model comprises a computer vision model that is configured to;
      
      receive a set of image data as the first set of inputs, the set of image data descriptive of one or more first frames of imagery that depict a scene; and
      
      in response to receipt of the set of image data, output a set of depth data as the first set of outputs, the set of depth data descriptive of one or more depths associated with the scene depicted by the one or more frames of imagery; and
      
      the decoder model comprises an image rendering model that is configured to;
      
      receive the set of depth data; and
      
      in response to receipt of the set of depth data, predict a second set of image data, the second set of image data comprising one or more second frames of imagery that depict the expected appearance of the scene in view of the set of depth data.
  - 17. The computing system of claim 12, wherein:
    - the encoder model comprises a speech-to-text model that is configured to;
      
      receive a set of audio data as the first set of inputs, the set of audio data descriptive of an utterance; and
      
      in response to receipt of the set of audio data, output a set of textual data as the first set of outputs, the set of textual data providing a textual transcript of the utterance; and
      
      the decoder model comprises a text-to-speech model that is configured to;
      
      receive the set of textual data; and
      
      in response to receipt of the set of textual data, predict a second set of audio data, the second set of audio data comprising a recreated utterance of the textual transcript.

18. A computing system, comprising:
- at least one processor; and
  
  at least one memory that stores a machine-learned encoder model that is configured to receive a first set of inputs and output a first set of outputs, the encoder model having been trained by sequentially backpropagating a loss function through a decoder model without modifying the decoder model and then through the encoder model to modify at least one weight of the encoder model, the decoder model configured to receive the first set of outputs and output a second set of outputs, the loss function descriptive of a difference between the first set of inputs and the second set of outputs.
- View Dependent Claims (19, 20)
- - 19. The computing system of claim 18, wherein the encoder model comprises a neural network.
  - 20. The computing system of claim 18, wherein:
    - the encoder model comprises a sensor fusion model; and
      
      the computing system is configured to;
      
      receive a set of sensor data reported by a plurality of sensors of a mobile device;
      
      input the set of sensor data into the sensor fusion model; and
      
      receive a set of pose data as an output of the encoder model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Holt, Jason E., Herreshoff, Marcello Mathias
Primary Examiner(s)
Schnee, Hal
Assistant Examiner(s)
Mang, Van C

Application Number

US15/222,997
Publication Number

US 20180032871A1
Time in Patent Office

1,208 Days
Field of Search
US Class Current
CPC Class Codes

G06N 3/045 Combinations of networks

G06N 3/084 Backpropagation, e.g. using...

Systems and methods to perform machine learning with feedback consistency

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods to perform machine learning with feedback consistency

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links