Latent semantic taxonomy generation
First Claim
Patent Images
1. A computer-based method for automatically constructing a taxonomy for a collection of documents, comprising:
- (a) generating a representation of each document in the collection of documents in a conceptual representation space;
(b) identifying a set of document clusters in the collection of documents based on a conceptual similarity among the representations of the documents; and
(c) generating a taxon for a document cluster in the set of document clusters based on at least one of (i) a term in a document of at least one of the document clusters, or (ii) a term represented in the conceptual representation space.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for automatically constructing a taxonomy for a collection of documents. For a given collection of documents, a method in accordance with an embodiment of the present invention creates document clusters, assigns taxons (titles) to the clusters, and organizes the clusters in a hierarchy. The clusters in the hierarchy are ordered from general to specific in the depth of the hierarchy, and from most similar to least similar in the breadth of the hierarchy. This method is capable of producing meaningful classifications in a short time.
294 Citations
22 Claims
-
1. A computer-based method for automatically constructing a taxonomy for a collection of documents, comprising:
-
(a) generating a representation of each document in the collection of documents in a conceptual representation space;
(b) identifying a set of document clusters in the collection of documents based on a conceptual similarity among the representations of the documents; and
(c) generating a taxon for a document cluster in the set of document clusters based on at least one of (i) a term in a document of at least one of the document clusters, or (ii) a term represented in the conceptual representation space. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer program product comprising a computer usable medium having computer readable program code stored therein that causes an application program for automatically constructing a taxonomy for a collection of documents to execute on an operating system of a computer, the computer readable program code comprising:
-
computer readable first program code that causes the computer to generate a representation of each document in the collection of documents in a conceptual representation space;
computer readable second program code that causes the computer to identify a set of document clusters in the collection of documents based on a conceptual similarity among the representations of the documents; and
computer readable third program code that causes the computer to generate a taxon for a document cluster in the set of document clusters based on at least one of (i) a term in a document of at least one of the document clusters, or (ii) a term represented in the conceptual representation space. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
Specification