Collapsed gibbs sampler for sparse topic models and discrete matrix factorization
First Claim
1. A non-transitory storage medium storing instructions executable by a processor to perform a method comprising:
- generating feature representations comprising distributions over a set of features corresponding to objects of a training corpus of objects; and
inferring a topic model defining a set of topics by performing latent Dirichlet allocation (LDA) with an Indian Buffet Process (IBP) compound Dirichlet prior probability distribution, the inferring being performed using a collapsed Gibbs sampling algorithm by iteratively sampling (1) topic allocation variables of the LDA and (2) binary activation variables of the IBP compound Dirichlet prior probability distribution;
wherein the inferring performed using a collapsed Gibbs sampling algorithm does not iteratively sample any parameters other than topic allocation variables of the LDA and binary activation variables of the IBP compound Dirichlet prior probability distribution.
6 Assignments
0 Petitions
Accused Products
Abstract
In an inference system for organizing a corpus of objects, feature representations are generated comprising distributions over a set of features corresponding to the objects. A topic model defining a set of topics is inferred by performing latent Dirichlet allocation (LDA) with an Indian Buffet Process (IBP) compound Dirichlet prior probability distribution. The inference is performed using a collapsed Gibbs sampling algorithm by iteratively sampling (1) topic allocation variables of the LDA and (2) binary activation variables of the IBP compound Dirichlet prior. In some embodiments the inference is configured such that each inferred topic model is a clean topic model with topics defined as distributions over sub-sets of the set of features selected by the prior. In some embodiments the inference is configured such that the inferred topic model associates a focused sub-set of the set of topics to each object of the training corpus.
12 Citations
21 Claims
-
1. A non-transitory storage medium storing instructions executable by a processor to perform a method comprising:
-
generating feature representations comprising distributions over a set of features corresponding to objects of a training corpus of objects; and inferring a topic model defining a set of topics by performing latent Dirichlet allocation (LDA) with an Indian Buffet Process (IBP) compound Dirichlet prior probability distribution, the inferring being performed using a collapsed Gibbs sampling algorithm by iteratively sampling (1) topic allocation variables of the LDA and (2) binary activation variables of the IBP compound Dirichlet prior probability distribution; wherein the inferring performed using a collapsed Gibbs sampling algorithm does not iteratively sample any parameters other than topic allocation variables of the LDA and binary activation variables of the IBP compound Dirichlet prior probability distribution. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method comprising:
-
generating feature representations comprising distributions over a set of features corresponding to objects d=1, . . . , D of a training corpus of D objects; and inferring a generative topic model defining a set of topics k=1, . . . , K by performing a latent generative topic model allocation using a collapsed Gibbs sampling algorithm with an Indian Buffet Process (IBP) compound prior probability distribution having binary activation variables θ ε
K×
D and object-specific topic proportions θ
d|θ d˜
Dirichlet(α
θ d) with weights α
that are the same for all the objects d=1, . . . , D,wherein the inferring includes iterative sampling of (1) topic allocation variables of the generative topic model allocation and (2) the binary activation variables θ of the IBP compound prior probability distribution; andwherein the generating and inferring are performed by a digital processor. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
-
21. An apparatus comprising:
-
a digital processor configured to perform a method including; generating feature representations comprising distributions over a set of features corresponding to documents d=1, . . . , D of a training corpus of D documents; and inferring a generative latent Dirichlet allocation (LDA) or probabilistic latent semantic analysis (PLSA) topic model defining a set of topics k=1, . . . , K by performing a latent generative topic model allocation using a collapsed Gibbs sampling algorithm with an Indian Buffet Process (IBP) compound prior probability distribution having binary activation variables θ ε
K×
D and document-specific topic proportions θ
d|θ d˜
Dirichlet(α
θ d) with weights α
that are the same for all the documents d=1, . . . , D;wherein the inferring includes iterative sampling of (1) topic allocation variables of the generative topic model allocation and (2) the binary activation variables θ of the IBP compound prior probability distribution.
-
Specification