Training variational autoencoders to generate disentangled latent factors

US 10,643,131 B1
Filed: 08/05/2019
Issued: 05/05/2020
Est. Priority Date: 05/20/2016
Status: Active Grant

First Claim

Patent Images

1. A method performed by one or more computers for training a variational auto-encoder (VAE) to generate disentangled latent factors on a plurality of unlabeled training images,wherein the VAE has a plurality of parameters and is configured to receive an input image, process the input image to determine a latent representation of the input image that includes a plurality of latent factors, and to process the latent representation to generate a reconstruction of the input image, andwherein the method comprises:

receiving the plurality of unlabeled training images, and, for each unlabeled training image;

processing the unlabeled training image using the VAE to determine the latent representation of the unlabeled training image and to generate a reconstruction of the unlabeled training image in accordance with current values of the parameters of the VAE, andadjusting current values of the parameters of the VAE by determining a gradient of a loss function with respect to the parameters of the VAE, wherein the loss function L is of the form L=Q−

B(KL), where Q is a term that depends on a quality of the reconstruction of the unlabeled training image, KL is a term that measures a degree of independence between the latent factors in the latent representation of the unlabeled training image, and B is a fixed value that is greater than one.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a variational auto-encoder (VAE) to generate disentangled latent factors on unlabeled training images. In one aspect, a method includes receiving the plurality of unlabeled training images, and, for each unlabeled training image, processing the unlabeled training image using the VAE to determine the latent representation of the unlabeled training image and to generate a reconstruction of the unlabeled training image in accordance with current values of the parameters of the VAE, and adjusting current values of the parameters of the VAE by optimizing a loss function that depends on a quality of the reconstruction and also on a degree of independence between the latent factors in the latent representation of the unlabeled training image.

10 Citations

20 Claims

1. A method performed by one or more computers for training a variational auto-encoder (VAE) to generate disentangled latent factors on a plurality of unlabeled training images,wherein the VAE has a plurality of parameters and is configured to receive an input image, process the input image to determine a latent representation of the input image that includes a plurality of latent factors, and to process the latent representation to generate a reconstruction of the input image, andwherein the method comprises:
- receiving the plurality of unlabeled training images, and, for each unlabeled training image;
  
  processing the unlabeled training image using the VAE to determine the latent representation of the unlabeled training image and to generate a reconstruction of the unlabeled training image in accordance with current values of the parameters of the VAE, andadjusting current values of the parameters of the VAE by determining a gradient of a loss function with respect to the parameters of the VAE, wherein the loss function L is of the form L=Q−
  
  B(KL), where Q is a term that depends on a quality of the reconstruction of the unlabeled training image, KL is a term that measures a degree of independence between the latent factors in the latent representation of the unlabeled training image, and B is a fixed value that is greater than one.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising, after the VAE has been trained, using latent representations generated by the VAE as features of input images.
  - 3. The method of claim 1, wherein KL is a Kullback-Leibler divergence between probability distributions qϕ
    - (z|x) and p(z), qϕ
      
      (z|x) is a probability distribution defined by the output of an encoder of the VAE and p(z) is a prior probability distribution over the latent factors.
  - 4. The method of claim 1, wherein B is a value in a range of 2, exclusive, to 250, inclusive.
  - 5. The method of claim 4, wherein B is four.
  - 6. The method of claim 1, wherein the value of B is dependent on a number of latent factors in the latent representation of the input image.
  - 7. The method of claim 1, wherein the degree of independence between the latent factors is computed within a latent bottleneck with restricted effective capacity.
  - 8. The method of claim 1, wherein the VAE includes a latent bottleneck layer, the KL term also measures the capacity of the latent bottleneck layer, and the capacity of the latent bottleneck layer is adjusted simultaneously with the degree of independence.

9. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations for training a variational auto-encoder (VAE) to generate disentangled latent factors on a plurality of unlabeled training images, wherein the VAE has a plurality of parameters and is configured to receive an input image, process the input image to determine a latent representation of the input image that includes a plurality of latent factors, and to process the latent representation to generate a reconstruction of the input image, and wherein the operations comprise:
- receiving the plurality of unlabeled training images, and, for each unlabeled training image;
  
  processing the unlabeled training image using the VAE to determine the latent representation of the unlabeled training image and to generate a reconstruction of the unlabeled training image in accordance with current values of the parameters of the VAE, andadjusting current values of the parameters of the VAE by determining a gradient of a loss function with respect to the parameters of the VAE, wherein the loss function L is of the form L=Q−
  
  B(KL), where Q is a term that depends on a quality of the reconstruction of the unlabeled training image, KL is a term that measures a degree of independence between the latent factors in the latent representation of the unlabeled training image, and B is a fixed value that is greater than one.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The system of claim 9, the operations further comprising, after the VAE has been trained, using latent representations generated by the VAE as features of input images.
  - 11. The system of claim 9, wherein KL is a Kullback-Leibler divergence between probability distributions qϕ
    - (z|x) and p(z), qϕ
      
      (z|x) is a probability distribution defined by the output of an encoder of the VAE and p(z) is a prior probability distribution over the latent factors.
  - 12. The system of claim 9, wherein B is a value in a range of 2, exclusive, to 250, inclusive.
  - 13. The system of claim 12, wherein B is four.
  - 14. The system of claim 9, wherein the degree of independence between the latent factors is computed within a latent bottleneck with restricted effective capacity.
  - 15. The system of claim 9, wherein the VAE includes a latent bottleneck layer, the KL term also measures a capacity of the latent bottleneck layer, and the capacity of the latent bottleneck layer is adjusted simultaneously with the degree of independence.

16. A non-transitory computer-readable storage medium encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations for training a variational auto-encoder (VAE) to generate disentangled latent factors on a plurality of unlabeled training images, wherein the VAE is configured to receive an input image, process the input image to determine a latent representation of the input image that includes a plurality of latent factors, and to process the latent representation to generate a reconstruction of the input image, and wherein the operations comprise:
- receiving the plurality of unlabeled training images, and, for each unlabeled training image;
  
  processing the unlabeled training image using the VAE to determine the latent representation of the unlabeled training image and to generate a reconstruction of the unlabeled training image in accordance with current values of the parameters of the VAE, andadjusting current values of the parameters of the VAE by determining a gradient of a loss function with respect to the parameters of the VAE, wherein the loss function L is of the form L=Q−
  
  B(KL), where Q is a term that depends on a quality of the reconstruction of the unlabeled training image, KL is a term that measures a degree of independence between the latent factors in the latent representation of the unlabeled training image, and B is a fixed value that is greater than one.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer-readable storage medium of claim 16, the operations further comprising, after the VAE has been trained, using latent representations generated by the VAE as features of input images.
  - 18. The computer-readable storage medium of claim 16, wherein KL is a Kullback-Leibler divergence between probability distributions qϕ
    - (z|x) and p(z), qϕ
      
      (z|x) is a probability distribution defined by the output of an encoder of the VAE and p(z) is a prior probability distribution over the latent factors.
  - 19. The computer-readable storage medium of claim 16, wherein B is a value in a range of 2, exclusive, to 250, inclusive.
  - 20. The computer-readable storage medium of claim 16, wherein the VAE includes a latent bottleneck layer, the KL term also measures a capacity of the latent bottleneck layer, and the capacity of the latent bottleneck layer is adjusted simultaneously with the degree of independence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
DeepMind Technologies Limited (Alphabet Inc.)
Inventors
Matthey-de-l'Endroit, Loic, Pal, Arka Tilak, Mohamed, Shakir, Glorot, Xavier, Higgins, Irina, Lerchner, Alexander
Primary Examiner(s)
Bayat, Ali

Application Number

US16/532,191
Time in Patent Office

274 Days
Field of Search

382159
US Class Current
CPC Class Codes

G06F 17/18   for evaluating statistical ...

G06F 18/2155   characterised by the incorp...

G06N 3/045   Combinations of networks

G06N 3/084   Backpropagation, e.g. using...

G06N 7/01   Probabilistic graphical mod...

G06V 10/7753   Incorporation of unlabelled...

Training variational autoencoders to generate disentangled latent factors

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

10 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Training variational autoencoders to generate disentangled latent factors

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

10 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links