Method and apparatus for characterizing documents based on clusters of related words
First Claim
1. A method for characterizing a document with respect to clusters of conceptually related words, comprising:
- receiving the document, wherein the document contains a set of words;
selecting candidate clusters of conceptually related words that are related to the set of words;
wherein the candidate clusters are selected using a model that explains how sets of words are generated from clusters of conceptually related words; and
constructing a set of components to characterize the document, wherein the set of components includes components for candidate clusters, wherein each component indicates a degree to which a corresponding candidate cluster is related to the set of words.
2 Assignments
0 Petitions
Accused Products
Abstract
One embodiment of the present invention provides a system characterizes a document with respect to clusters of conceptually related words. Upon receiving a document containing a set of words, the system selects “candidate clusters” of conceptually related words that are related to the set of words. These candidate clusters are selected using a model that explains how sets of words are generated from clusters of conceptually related words. Next, the system constructs a set of components to characterize the document, wherein the set of components includes components for candidate clusters. Each component in the set of components indicates a degree to which a corresponding candidate cluster is related to the set of words.
-
Citations
62 Claims
-
1. A method for characterizing a document with respect to clusters of conceptually related words, comprising:
-
receiving the document, wherein the document contains a set of words;
selecting candidate clusters of conceptually related words that are related to the set of words;
wherein the candidate clusters are selected using a model that explains how sets of words are generated from clusters of conceptually related words; and
constructing a set of components to characterize the document, wherein the set of components includes components for candidate clusters, wherein each component indicates a degree to which a corresponding candidate cluster is related to the set of words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for characterizing a document with respect to clusters of conceptually related words, the method comprising:
-
receiving the document, wherein the document contains a set of words;
selecting candidate clusters of conceptually related words that are related to the set of words;
wherein the candidate clusters are selected using a model that explains how sets of words are generated from clusters of conceptually related words; and
constructing a set of components to characterize the document, wherein the set of components includes components for candidate clusters, wherein each component indicates a degree to which a corresponding candidate cluster is related to the set of words. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
-
-
41. An apparatus for characterizing a document with respect to clusters of conceptually related words, comprising:
-
a receiving mechanism, configured to receive the document, wherein the document contains a set of words;
a selection mechanism configured to select candidate clusters of conceptually related words that are related to the set of words;
wherein the candidate clusters are selected using a model that explains how sets of words are generated from clusters of conceptually related words; and
a component construction mechanism configured to construct a set of components to characterize the document, wherein the set of components includes components for candidate clusters, wherein each component indicates a degree to which a corresponding candidate cluster is related to the set of words. - View Dependent Claims (42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 60)
-
- 58. The apparatus of claim 58, wherein while performing the hill-climbing operations, the component construction mechanism is configured to periodically change states of individual candidate clusters without regards to an objective function for the hill-climbing operations to explore states of the probabilistic model that are otherwise unreachable through hill-climbing operations.
-
61. A computer-readable storage medium containing a data structure that facilitates characterizing a document with respect to clusters of conceptually related words, the data structure comprising:
-
a probabilistic model that contains nodes representing random variables for words and for clusters of conceptually related words;
wherein nodes in the probabilistic model are coupled together by weighted links;
wherein if a cluster node in the probabilistic model fires, a weighted link from the cluster node to another node can cause the other node to fire; and
wherein the other code can be associated with a word or a cluster. - View Dependent Claims (62)
-
Specification