Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture
First Claim
Patent Images
1. A document clustering method comprising:
- providing a document set comprising a plurality of documents;
providing a cluster comprising a subset of the documents of the document set;
using a plurality of terms of the documents, providing a cluster label indicative of subject matter content of the documents of the cluster, wherein the cluster label comprises a plurality of word senses; and
selecting one of the word senses of the cluster label.
2 Assignments
0 Petitions
Accused Products
Abstract
Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture are described. In one aspect, a document clustering method includes providing a document set comprising a plurality of documents, providing a cluster comprising a subset of the documents of the document set, using a plurality of terms of the documents, providing a cluster label indicative of subject matter content of the documents of the cluster, wherein the cluster label comprises a plurality of word senses, and selecting one of the word senses of the cluster label.
-
Citations
36 Claims
-
1. A document clustering method comprising:
-
providing a document set comprising a plurality of documents;
providing a cluster comprising a subset of the documents of the document set;
using a plurality of terms of the documents, providing a cluster label indicative of subject matter content of the documents of the cluster, wherein the cluster label comprises a plurality of word senses; and
selecting one of the word senses of the cluster label. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A document cluster label disambiguation method comprising:
-
providing a cluster label for a cluster comprising a subset of a plurality of documents of a document set, wherein the cluster label comprises one of a plurality of terms common to at least some of the documents of the cluster and the cluster label comprises a plurality of word senses;
determining, for individual ones of the word senses, a plurality of semantic similarity values for respective ones of the terms, wherein the semantic similarity values are individually indicative of a degree of semantic similarity between one of the word senses and one of the terms;
analyzing the semantic similarity values determined for respective ones of the word senses; and
selecting one of the word senses responsive to the analyzing. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A document clustering apparatus comprising:
processing circuitry configured to access a document set comprising a plurality of documents, to define a cluster comprising a subset of the documents of the document set, to identify a cluster label indicative of subject matter content of at least one of the documents of the cluster, and to disambiguate the cluster label after the identification of the cluster label to increase the relevancy of the cluster label with respect to the subject matter content of the at least one of the documents compared with the cluster label prior to the disambiguation. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
31. An article of manufacture comprising:
media comprising programming configured to cause processing circuitry to;
access a cluster label for a cluster comprising a subset of a plurality of documents of a document set, wherein the cluster label comprises one of a plurality of terms common to at least one of the documents of the cluster and the cluster label comprises a plurality of word senses;
determine, for individual ones of the word senses, a plurality of semantic similarity values for respective ones of the terms, wherein the semantic similarity values are individually indicative of a degree of semantic similarity between one of the word senses and one of the terms;
analyze the semantic similarity values determined for respective ones of the word senses; and
select one of the word senses responsive to the analysis. - View Dependent Claims (32, 33, 34, 35, 36)
Specification