Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture
First Claim
Patent Images
1. A document clustering method comprising:
- providing a document set comprising a plurality of documents;
providing a cluster comprising a subset of the documents of the document set, wherein the subset comprises a plurality of the documents;
using a plurality of terms of the documents of the cluster which are indicative of subject matter content of the documents of the cluster, selecting a cluster label indicative of the subject matter content of the documents of the cluster, wherein the cluster label is selected at least in part by co-occurrence of the cluster label and the plurality of terms of the documents of the cluster and wherein the cluster label comprises a plurality of word senses; and
selecting one of the word senses of the cluster label having an increased relevancy with respect to the plurality of terms of the documents of the cluster compared with the relevancies of others of the word senses.
2 Assignments
0 Petitions
Accused Products
Abstract
Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture are described. In one aspect, a document clustering method includes providing a document set comprising a plurality of documents, providing a cluster comprising a subset of the documents of the document set, using a plurality of terms of the documents, providing a cluster label indicative of subject matter content of the documents of the cluster, wherein the cluster label comprises a plurality of word senses, and selecting one of the word senses of the cluster label.
28 Citations
39 Claims
-
1. A document clustering method comprising:
-
providing a document set comprising a plurality of documents; providing a cluster comprising a subset of the documents of the document set, wherein the subset comprises a plurality of the documents; using a plurality of terms of the documents of the cluster which are indicative of subject matter content of the documents of the cluster, selecting a cluster label indicative of the subject matter content of the documents of the cluster, wherein the cluster label is selected at least in part by co-occurrence of the cluster label and the plurality of terms of the documents of the cluster and wherein the cluster label comprises a plurality of word senses; and selecting one of the word senses of the cluster label having an increased relevancy with respect to the plurality of terms of the documents of the cluster compared with the relevancies of others of the word senses. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14)
-
-
12. A document cluster label disambiguation method comprising:
-
selecting a cluster label for a cluster comprising a subset of a plurality of documents of a document set at least in part by co-occurrence of the cluster label and a plurality of terms of the documents of the cluster which are indicative of subject matter content of the documents of the cluster, wherein the subset comprises a plurality of the documents and wherein the cluster label comprises one of a plurality of terms common to at least some of the documents of the cluster and the cluster label comprises a plurality of word senses; determining, for individual ones of the word senses, a plurality of semantic similarity values for respective ones of the terms, wherein the semantic similarity values are individually indicative of a degree of semantic similarity between one of the word senses and one of the terms; analyzing the semantic similarity values determined for respective ones of the word senses; selecting one of the word senses using the analyzing; and wherein the selecting the one of the word senses comprises, using the terms of the documents of the cluster, selecting the one of the word senses having an increased relevancy with respect to the subject matter content of the documents of the cluster compared with the relevancies of others of the word senses. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A document clustering apparatus comprising:
processing circuitry configured to access a document set comprising a plurality of documents, to define a cluster comprising a subset of the documents of the document set and wherein the subset comprises a plurality of the documents, to identify a cluster label indicative of subject matter content of at least one of the documents of the cluster at least in part by co-occurrence of the cluster label and a plurality of terms of the documents of the cluster which are indicative of the subject matter content of the documents of the cluster, and to use the terms of the documents of the cluster to disambiguate the cluster label after the identification of the cluster label to increase the relevancy of the cluster label with respect to the subject matter content of the at least one of the documents compared with the cluster label prior to the disambiguation. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
33. An article of manufacture comprising:
a computer-readable storage medium comprising programming configured to cause processing circuitry to; select a cluster label for a cluster comprising a subset of a plurality of documents of a document set at least in part by co-occurrence of the cluster label and a plurality of terms of the documents of the cluster which are indicative of subject matter content of the documents of the cluster, wherein the subset comprises a plurality of the documents and wherein the cluster label comprises one of a plurality of terms common to at least one of the documents of the cluster and the cluster label comprises a plurality of word senses; determine, for individual ones of the word senses, a plurality of semantic similarity values for respective ones of the terms, wherein the semantic similarity values are individually indicative of a degree of semantic similarity between one of the word senses and one of the terms; analyze the semantic similarity values determined for respective ones of the word senses; and select one of the word senses using the analysis, wherein the one of the word senses has an increased relevancy with respect to the terms of the documents of the cluster compared with the relevancies of others of the word senses. - View Dependent Claims (34, 35, 36, 37, 38, 39)
Specification