Method and apparatus for learning a probabilistic generative model for text
First Claim
1. A method for constructing a model that generates text, the method comprising:
- in a computer system, performing the operations of;
representing a concept as a cluster node;
representing a word as a terminal node;
assigning a weight to a link between two nodes; and
training the model based on a set of documents, comprising;
for each cluster node, computing a probabilistic cost of a corresponding concept existing in a document but not triggering any words.
2 Assignments
0 Petitions
Accused Products
Abstract
One embodiment of the present invention provides a system that learns a generative model for textual documents. During operation, the system receives a current model, which contains terminal nodes representing random variables for words and cluster nodes representing clusters of conceptually related words. Within the current model, nodes are coupled together by weighted links, so that if a cluster node in the probabilistic model fires, a weighted link from the cluster node to another node causes the other node to fire with a probability proportionate to the link weight. The system also receives a set of training documents, wherein each training document contains a set of words. Next, the system applies the set of training documents to the current model to produce a new model.
-
Citations
42 Claims
-
1. A method for constructing a model that generates text, the method comprising:
in a computer system, performing the operations of; representing a concept as a cluster node; representing a word as a terminal node; assigning a weight to a link between two nodes; and training the model based on a set of documents, comprising; for each cluster node, computing a probabilistic cost of a corresponding concept existing in a document but not triggering any words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
15. A computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for constructing a model that generates text, the method comprising:
-
representing a concept as a cluster node; representing a word as a terminal node; assigning a weight to a link between two nodes; and training the model based on a set of documents, comprising; for each cluster node, computing a probabilistic cost of a corresponding concept existing in a document but not triggering any words. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. An apparatus for generating text, the apparatus comprising:
-
a modeling mechanism in a computer system configured to; represent a concept as a cluster node; represent a word as a terminal node; assign a weight to a link from a first node to a second node, wherein the weight indicates a probability of the second node triggering additional nodes when the first node triggers the second node; a model-training mechanism in the computer system configured to train the model based on a set of documents, wherein the training comprises; for each cluster node, computing a probabilistic cost of a corresponding concept existing in a document but not triggering any words. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
-
Specification