Methods and apparatus for rank-based response set clustering
First Claim
1. A method for identifying clusters of similar documents from among a set of documents, the method comprising:
- (a) selecting a particular document based on rank from among a ranked set of documents;
(b) generating a probe based on the particular document, the probe comprising one or more features;
(c) finding documents that satisfy a similarity condition from among available documents of the set of documents using a search based upon the probe;
(d) associating some or all documents found with a particular cluster of documents; and
(e) repeating steps (a)-(d) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to identify at least one other cluster of documents, wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for identifying clusters of similar documents from among a set of documents is described. A particular document is selected based on rank from among a ranked set of documents, wherein the ranked set of documents are included among available documents of the set of documents. A probe is generated based on the particular document. The probe comprising one or more features. Documents that satisfy a similarity condition are found from among the available documents using a search based upon the probe. Some or all documents found are associated with a particular cluster of documents. The process can be repeated to generate further clusters. The method can be implemented with a computer, and associated programming instructions can be contained within a compute readable carrier.
41 Citations
25 Claims
-
1. A method for identifying clusters of similar documents from among a set of documents, the method comprising:
-
(a) selecting a particular document based on rank from among a ranked set of documents;
(b) generating a probe based on the particular document, the probe comprising one or more features;
(c) finding documents that satisfy a similarity condition from among available documents of the set of documents using a search based upon the probe;
(d) associating some or all documents found with a particular cluster of documents; and
(e) repeating steps (a)-(d) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to identify at least one other cluster of documents, wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 25)
-
-
13. An apparatus for identifying clusters of similar documents from among a set of documents, comprising:
-
a memory; and
a processor coupled to the memory, wherein the processor is configured to execute the steps of;
(a) selecting a particular document based on rank from among a ranked set of documents;
(b) generating a probe based on the particular document, the probe comprising one or more features;
(c) finding documents that satisfy a similarity condition from among available documents of the set of documents using a search based upon the probe;
(d) associating some or all documents found with a particular cluster of documents; and
(e) repeating steps (a)-(d) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to identify at least one other cluster of documents, wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
Specification