Selectively merging clusters of conceptually related words in a generative model for text
First Claim
1. A computer-implemented method comprising:
- receiving, by at least one processor, a search query comprising one or more terms from a device of a user;
in response to the received search query, accessing, by the at least one processor, data identifying contents of one or more web pages;
obtaining, by the at least one processor, a probabilistic generative model that includes (i) a first node that descends from one or more parent nodes, and (ii) one or more child nodes that descend from a second node;
determining, by the at least one processor, to merge the first node that descends from the one or more parent nodes of the first node with the second node from which the one or more child nodes of the second node descend;
in response to determining to merge the first node that descends from the one or more parent nodes of the first node with the second node from which the one or more child nodes of the second node descend, determining that, before the first node and the second node are merged, a particular node of the probabilistic generative model is both (i) one of the one or more parent nodes of the first node, and (ii) one of the one or more child nodes of the second node, then, after the first node and the second node are merged, designating the particular node as (i) a child node that descends from a combined node that results from merging the first node with the second node, and (ii) not a parent node from which the combined node that results from merging the first node with the second node descends;
generating, by the at least one processor, a first concept characterizing the search query and one or more second concepts characterizing each of the web pages using the probabilistic generative model;
determining, by the at least one processor, that at least one of the second concepts matches the first concept, the at least one second concept being associated with at least one of the web pages; and
transmitting, to the user device, a response to the search query that identifies the at least one of the web pages.
2 Assignments
0 Petitions
Accused Products
Abstract
One embodiment of the present invention provides a system that merges similar clusters of conceptually-related words in a probabilistic generative model for textual documents. During operation, the system receives a current model, which contains terminal nodes representing random variables for words and contains cluster nodes representing clusters of conceptually related words. Nodes in the current model are coupled together by weighted links, wherein if a node fires, a link from the node to another node causes the other node to fire with a probability proportionate to the weight of the link. Next, the system determines whether cluster nodes in the current model explain other cluster nodes in the current model. If two cluster nodes explain each other, the system merges the two cluster nodes to form a combined cluster node.
30 Citations
21 Claims
-
1. A computer-implemented method comprising:
-
receiving, by at least one processor, a search query comprising one or more terms from a device of a user; in response to the received search query, accessing, by the at least one processor, data identifying contents of one or more web pages; obtaining, by the at least one processor, a probabilistic generative model that includes (i) a first node that descends from one or more parent nodes, and (ii) one or more child nodes that descend from a second node; determining, by the at least one processor, to merge the first node that descends from the one or more parent nodes of the first node with the second node from which the one or more child nodes of the second node descend; in response to determining to merge the first node that descends from the one or more parent nodes of the first node with the second node from which the one or more child nodes of the second node descend, determining that, before the first node and the second node are merged, a particular node of the probabilistic generative model is both (i) one of the one or more parent nodes of the first node, and (ii) one of the one or more child nodes of the second node, then, after the first node and the second node are merged, designating the particular node as (i) a child node that descends from a combined node that results from merging the first node with the second node, and (ii) not a parent node from which the combined node that results from merging the first node with the second node descends; generating, by the at least one processor, a first concept characterizing the search query and one or more second concepts characterizing each of the web pages using the probabilistic generative model; determining, by the at least one processor, that at least one of the second concepts matches the first concept, the at least one second concept being associated with at least one of the web pages; and transmitting, to the user device, a response to the search query that identifies the at least one of the web pages. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
a non-transitory computer readable medium having instructions stored thereon; and data processing apparatus programmed to execute the instructions to perform operations comprising; receiving, from a device of a user, a search query comprising one or more terms; in response to the received search query, accessing data identifying contents of one or more web pages; obtaining a probabilistic generative model that includes (i) a first node that descends from one or more parent nodes, and (ii) one or more child nodes that descend from a second node; determining to merge the first node that descends from the one or more parent nodes of the first node with the second node from which the one or more child nodes of the second node descend; in response to determining to merge the first node that descends from the one or more parent nodes of the first node with the second node from which the one or more child nodes of the second node descend, determining that, before the first node and the second node are merged, a particular node of the probabilistic generative model is both (i) one of the one or more parent nodes of the first node, and (ii) one of the one or more child nodes of the second node, then, after the first node and the second node are merged, designating the particular node as (i) a child node that descends from a combined node that results from merging the first node with the second node, and (ii) not a parent node from which the combined node that results from merging the first node with the second node descends; generating a first concept characterizing the search query and one or more second concepts characterizing each of the web pages using the probabilistic generative model; determining that at least one of the second concepts matches the first concept, the at least one second concept being associated with at least one of the web pages; and transmitting, to the user device, a response to the search query that identifies the at least one of the web pages. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory computer-readable medium having instructions stored thereon that, when executed by a processor, cause the processor to perform operations comprising:
-
receiving, from a device of a user, a search query comprising one or more terms; in response to the received search query, accessing data identifying contents of one or more web pages; obtaining a probabilistic generative model that includes (i) a first node that descends from one or more parent nodes, and (ii) one or more child nodes that descend from a second node; determining to merge the first node that descends from the one or more parent nodes of the first node with the second node from which the one or more child nodes of the second node descend; in response to determining to merge the first node that descends from the one or more parent nodes of the first node with the second node from which the one or more child nodes of the second node descend, determining that, before the first node and the second node are merged, a particular node of the probabilistic generative model is both (i) one of the one or more parent nodes of the first node, and (ii) one of the one or more child nodes of the second node, then, after the first node and the second node are merged, designating the particular node as (i) a child node that descends from a combined node that results from merging the first node with the second node, and (ii) not a parent node from which the combined node that results from merging the first node with the second node descends; generating a first concept characterizing the search query and one or more second concepts characterizing each of the web pages using the probabilistic generative model; determining that at least one of the second concepts matches the first concept, the at least one second concept being associated with at least one of the web pages; and transmitting, to the user device, a response to the search query that identifies the at least one of the web pages. - View Dependent Claims (18, 19, 20, 21)
-
Specification