DISCRETE VARIATIONAL AUTO-ENCODER SYSTEMS AND METHODS FOR MACHINE LEARNING USING ADIABATIC QUANTUM COMPUTERS
First Claim
1. A method for unsupervised learning over an input space comprising discrete or continuous variables, and at least a subset of a training dataset of samples of the respective variables, to attempt to identify the value of at least one parameter that increases the log-likelihood of the at least a subset of a training dataset with respect to a model, the model expressible as a function of the at least one parameter, the method executed by circuitry including at least one processor and comprising;
- forming a first latent space comprising a plurality of random variables, the plurality of random variables comprising one or more discrete random variables;
forming a second latent space comprising the first latent space and a set of supplementary continuous random variables;
forming a first transforming distribution comprising a conditional distribution over the set of supplementary continuous random variables, conditioned on the one or more discrete random variables of the first latent space;
forming an encoding distribution comprising an approximating posterior distribution over the first latent space, conditioned on the input space;
forming a prior distribution over the first latent space;
forming a decoding distribution comprising a conditional distribution over the input space conditioned on the set of supplementary continuous random variables;
determining an ordered set of conditional cumulative distribution functions of the supplementary continuous random variables, each cumulative distribution function comprising functions of a full distribution of at least one of the one or more discrete random variables of the first latent space;
determining an inversion of the ordered set of conditional cumulative distribution functions of the supplementary continuous random variables;
constructing a first stochastic approximation to a lower bound on the log-likelihood of the at least a subset of a training dataset;
constructing a second stochastic approximation to a gradient of the lower bound on the log-likelihood of the at least a subset of a training dataset; and
increasing the lower bound on the log-likelihood of the at least a subset of a training dataset based at least in part on the gradient of the lower bound on the log-likelihood of the at least a subset of a training dataset.
10 Assignments
0 Petitions
Accused Products
Abstract
A computational system can include digital circuitry and analog circuitry, for instance a digital processor and a quantum processor. The quantum processor can operate as a sample generator providing samples. Samples can be employed by the digital processing in implementing various machine learning techniques. For example, the computational system can perform unsupervised learning over an input space, for example via a discrete variational auto-encoder, and attempting to maximize the log-likelihood of an observed dataset. Maximizing the log-likelihood of the observed dataset can include generating a hierarchical approximating posterior.
25 Citations
36 Claims
-
1. A method for unsupervised learning over an input space comprising discrete or continuous variables, and at least a subset of a training dataset of samples of the respective variables, to attempt to identify the value of at least one parameter that increases the log-likelihood of the at least a subset of a training dataset with respect to a model, the model expressible as a function of the at least one parameter, the method executed by circuitry including at least one processor and comprising;
-
forming a first latent space comprising a plurality of random variables, the plurality of random variables comprising one or more discrete random variables; forming a second latent space comprising the first latent space and a set of supplementary continuous random variables; forming a first transforming distribution comprising a conditional distribution over the set of supplementary continuous random variables, conditioned on the one or more discrete random variables of the first latent space; forming an encoding distribution comprising an approximating posterior distribution over the first latent space, conditioned on the input space; forming a prior distribution over the first latent space; forming a decoding distribution comprising a conditional distribution over the input space conditioned on the set of supplementary continuous random variables; determining an ordered set of conditional cumulative distribution functions of the supplementary continuous random variables, each cumulative distribution function comprising functions of a full distribution of at least one of the one or more discrete random variables of the first latent space; determining an inversion of the ordered set of conditional cumulative distribution functions of the supplementary continuous random variables; constructing a first stochastic approximation to a lower bound on the log-likelihood of the at least a subset of a training dataset; constructing a second stochastic approximation to a gradient of the lower bound on the log-likelihood of the at least a subset of a training dataset; and increasing the lower bound on the log-likelihood of the at least a subset of a training dataset based at least in part on the gradient of the lower bound on the log-likelihood of the at least a subset of a training dataset. - View Dependent Claims (7, 8, 10, 11, 13, 14, 15, 16)
-
-
2. The method of 1 wherein increasing the lower bound on the log-likelihood of the at least a subset of a training dataset based at least in part on the gradient of the lower bound on the log-likelihood of the at least a subset of a training dataset includes increasing the lower bound on the log-likelihood of the at least a subset of a training dataset using a method of gradient descent.
-
3. The method of 2 wherein increasing the lower bound on the log-likelihood of the at least a subset of a training dataset using a method of gradient descent includes attempting to maximize the lower bound on the log-likelihood of the at least a subset of a training dataset using a method of gradient descent.
-
4-6. -6. (canceled)
-
9. (canceled)
-
12. (canceled)
-
17. A computational system, comprising:
-
at least one processor; and form a first latent space comprising a plurality of random variables, the plurality of random variables comprising one or more discrete random variables; form a second latent space comprising the first latent space and a set of supplementary continuous random variables; form a first transforming distribution comprising a conditional distribution over the set of supplementary continuous random variables, conditioned on the one or more discrete random variables of the first latent space; form an encoding distribution comprising an approximating posterior distribution over the first latent space, conditioned on the input space; form a prior distribution over the first latent space; form a decoding distribution comprising a conditional distribution over the input space conditioned on the set of supplementary continuous random variables; determine an ordered set of conditional cumulative distribution functions of the supplementary continuous random variables, each cumulative distribution function comprising functions of a full distribution of at least one of the one or more discrete random variables of the first latent space; determine an inversion of the ordered set of conditional cumulative distribution functions of the supplementary continuous random variables; construct a first stochastic approximation to a lower bound on the log-likelihood of the at least a subset of a training dataset; construct a second stochastic approximation to a gradient of the lower bound on the log-likelihood of the at least a subset of a training dataset; and increase the lower bound on the log-likelihood of the at least a subset of a training dataset based at least in part on the gradient of the lower bound on the log-likelihood of the at least a subset of a training dataset.
-
-
18. A method for unsupervised learning by a computational system, the method executable by circuitry including at least one processor and comprising:
-
forming a model, the model comprising one or more model parameters; initializing the model parameters; receiving a training dataset comprising a plurality of subsets of the training dataset; testing to determine if a stopping criterion has been met; in response to determining the stopping criterion has not been met; fetching a mini-batch comprising one of the plurality of subsets of the training dataset, the mini-batch comprising input data; performing propagation through an encoder that computes an approximating posterior distribution over a discrete space; sampling from the approximating posterior distribution over a set of continuous random variables via a sampler; performing propagation through a decoder that computes an auto-encoded distribution over the input data; performing backpropagation through the decoder of a log-likelihood of the input data with respect to the auto-encoded distribution over the input data; performing backpropagation through the sampler that samples from the approximating posterior distribution over the set of continuous random variables to generate an auto-encoded gradient; determining a first gradient of a KL-divergence, with respect to the approximating posterior, between the approximating posterior distribution and a true prior distribution over the discrete space; performing backpropagation through the encoder of a sum of the auto-encoding gradient and the first gradient of the KL-divergence with respect to the approximating posterior; determining a second gradient of a KL-divergence, with respect to parameters of the true prior distribution, between the approximating posterior and the true prior distribution over the discrete space; determining at least one of a gradient or at least a stochastic approximation of a gradient, of a bound on the log-likelihood of the input data; updating the model parameters based at least in part on the determined at least one of the gradient or at least a stochastic approximation of the gradient, of the bound on the log-likelihood of the input data. - View Dependent Claims (22, 23, 24, 27, 28, 29, 33, 34, 35)
-
-
19-21. -21. (canceled)
-
25-26. -26. (canceled)
-
30-32. -32. (canceled)
-
36-42. -42. (canceled)
Specification