Method and apparatus for selecting links to include in a probabilistic generative model for text
First Claim
Patent Images
1. A computer-implemented method comprising:
- applying training documents to links in a current generative model that each connect a respective terminal node that represents a corresponding word to a respective cluster node that represents a corresponding cluster of conceptually related words, to determine a respective expected count for each of the links;
selecting, from the links that each connect a respective terminal node that represents a corresponding word to a respective cluster node that represents a corresponding cluster of conceptually related words, a first subset of one or more links that are each associated with more than a predetermined number of sources of the training documents;
for each selected link of the first subset, determining (i) a significance of the link, and (ii) a link rating for the link based on the expected count for the link and the significance;
ranking the selected links of the first subset based on the link ratings;
selecting a second subset of the ranked links; and
generating a new generative model using only the selected links of the second subset, without using any of the links in the current generative model that were not selected for the second subset.
3 Assignments
0 Petitions
Accused Products
Abstract
A method may include receiving, at one or more processors, a current model. The current model may include a group of nodes representing words, at least one cluster of nodes representing related words, and a group of links. Each link may connect two nodes of the group of nodes. Each link may include a corresponding weight. The method may further include applying, by one or more processors, a set of training documents to the model to produce new weights for the group of links to create a new model; and making, by one or more processors, the new model the current model.
23 Citations
14 Claims
-
1. A computer-implemented method comprising:
-
applying training documents to links in a current generative model that each connect a respective terminal node that represents a corresponding word to a respective cluster node that represents a corresponding cluster of conceptually related words, to determine a respective expected count for each of the links; selecting, from the links that each connect a respective terminal node that represents a corresponding word to a respective cluster node that represents a corresponding cluster of conceptually related words, a first subset of one or more links that are each associated with more than a predetermined number of sources of the training documents; for each selected link of the first subset, determining (i) a significance of the link, and (ii) a link rating for the link based on the expected count for the link and the significance; ranking the selected links of the first subset based on the link ratings; selecting a second subset of the ranked links; and generating a new generative model using only the selected links of the second subset, without using any of the links in the current generative model that were not selected for the second subset. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a non-transitory computer readable medium having instructions stored thereon; and data processing apparatus programmed to execute the instructions to perform operations comprising; applying training documents to links in a current generative model that each connect a respective terminal node that represents a corresponding word to a respective cluster node that represents a corresponding cluster of conceptually related words, to determine a respective expected count for each of the links; selecting, from the links that each connect a respective terminal node that represents a corresponding word to a respective cluster node that represents a corresponding cluster of conceptually related words, a first subset of one or more links that are each associated with more than a predetermined number of sources of the training documents; for each selected link of the first subset, determining (i) a significance of the link, and (ii) a link rating for the link based on the expected count for the link and the significance; ranking the selected links of the first subset based on the link ratings; selecting a second subset of the ranked links; and generating a new generative model using only the selected links of the second subset, without using any of the links in the current generative model that were not selected for the second subset. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification