Training variational autoencoders to generate disentangled latent factors
First Claim
1. A method performed by one or more computers for training a variational auto-encoder (VAE) to generate disentangled latent factors on a plurality of unlabeled training images,wherein the VAE has a plurality of parameters and is configured to receive an input image, process the input image to determine a latent representation of the input image that includes a plurality of latent factors, and to process the latent representation to generate a reconstruction of the input image, andwherein the method comprises:
- receiving the plurality of unlabeled training images, and, for each unlabeled training image;
processing the unlabeled training image using the VAE to determine the latent representation of the unlabeled training image and to generate a reconstruction of the unlabeled training image in accordance with current values of the parameters of the VAE, andadjusting current values of the parameters of the VAE by determining a gradient of a loss function with respect to the parameters of the VAE, wherein the loss function L is of the form L=Q−
B(KL), where Q is a term that depends on a quality of the reconstruction of the unlabeled training image, KL is a term that measures a degree of independence between the latent factors in the latent representation of the unlabeled training image, and B is a fixed value that is greater than one.
5 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a variational auto-encoder (VAE) to generate disentangled latent factors on unlabeled training images. In one aspect, a method includes receiving the plurality of unlabeled training images, and, for each unlabeled training image, processing the unlabeled training image using the VAE to determine the latent representation of the unlabeled training image and to generate a reconstruction of the unlabeled training image in accordance with current values of the parameters of the VAE, and adjusting current values of the parameters of the VAE by optimizing a loss function that depends on a quality of the reconstruction and also on a degree of independence between the latent factors in the latent representation of the unlabeled training image.
10 Citations
20 Claims
-
1. A method performed by one or more computers for training a variational auto-encoder (VAE) to generate disentangled latent factors on a plurality of unlabeled training images,
wherein the VAE has a plurality of parameters and is configured to receive an input image, process the input image to determine a latent representation of the input image that includes a plurality of latent factors, and to process the latent representation to generate a reconstruction of the input image, and wherein the method comprises: -
receiving the plurality of unlabeled training images, and, for each unlabeled training image; processing the unlabeled training image using the VAE to determine the latent representation of the unlabeled training image and to generate a reconstruction of the unlabeled training image in accordance with current values of the parameters of the VAE, and adjusting current values of the parameters of the VAE by determining a gradient of a loss function with respect to the parameters of the VAE, wherein the loss function L is of the form L=Q−
B(KL), where Q is a term that depends on a quality of the reconstruction of the unlabeled training image, KL is a term that measures a degree of independence between the latent factors in the latent representation of the unlabeled training image, and B is a fixed value that is greater than one. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations for training a variational auto-encoder (VAE) to generate disentangled latent factors on a plurality of unlabeled training images, wherein the VAE has a plurality of parameters and is configured to receive an input image, process the input image to determine a latent representation of the input image that includes a plurality of latent factors, and to process the latent representation to generate a reconstruction of the input image, and wherein the operations comprise:
-
receiving the plurality of unlabeled training images, and, for each unlabeled training image; processing the unlabeled training image using the VAE to determine the latent representation of the unlabeled training image and to generate a reconstruction of the unlabeled training image in accordance with current values of the parameters of the VAE, and adjusting current values of the parameters of the VAE by determining a gradient of a loss function with respect to the parameters of the VAE, wherein the loss function L is of the form L=Q−
B(KL), where Q is a term that depends on a quality of the reconstruction of the unlabeled training image, KL is a term that measures a degree of independence between the latent factors in the latent representation of the unlabeled training image, and B is a fixed value that is greater than one. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A non-transitory computer-readable storage medium encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations for training a variational auto-encoder (VAE) to generate disentangled latent factors on a plurality of unlabeled training images, wherein the VAE is configured to receive an input image, process the input image to determine a latent representation of the input image that includes a plurality of latent factors, and to process the latent representation to generate a reconstruction of the input image, and wherein the operations comprise:
-
receiving the plurality of unlabeled training images, and, for each unlabeled training image; processing the unlabeled training image using the VAE to determine the latent representation of the unlabeled training image and to generate a reconstruction of the unlabeled training image in accordance with current values of the parameters of the VAE, and adjusting current values of the parameters of the VAE by determining a gradient of a loss function with respect to the parameters of the VAE, wherein the loss function L is of the form L=Q−
B(KL), where Q is a term that depends on a quality of the reconstruction of the unlabeled training image, KL is a term that measures a degree of independence between the latent factors in the latent representation of the unlabeled training image, and B is a fixed value that is greater than one. - View Dependent Claims (17, 18, 19, 20)
-
Specification