Methods and apparatus for probe-based clustering
First Claim
1. A method for identifying clusters of similar documents from among a set of documents, the method comprising:
- (a) selecting a particular document from among available documents of the set of documents;
(b) generating a probe based on the particular document, the probe comprising one or more features;
(c) finding documents that satisfy a similarity condition using the probe from among the available documents;
(d) associating some or all of the documents that satisfy the similarity condition with a particular cluster of documents;
(e) repeating steps (a)-(d) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to identify at least one other cluster of documents, wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for identifying clusters of similar documents from among a set of documents is described. A particular document is selected from among available documents of the set of documents, and a probe is generated based on the particular document. The probe comprises one or more features. Documents are found that satisfy a similarity condition using the probe from among the available documents. Some or all of the documents that satisfy the similarity condition are associated with a particular cluster of documents. The process can be repeated to generate further clusters. The method can be implemented with a computer, and associated programming instructions can be contained within a compute readable carrier.
-
Citations
23 Claims
-
1. A method for identifying clusters of similar documents from among a set of documents, the method comprising:
-
(a) selecting a particular document from among available documents of the set of documents;
(b) generating a probe based on the particular document, the probe comprising one or more features;
(c) finding documents that satisfy a similarity condition using the probe from among the available documents;
(d) associating some or all of the documents that satisfy the similarity condition with a particular cluster of documents;
(e) repeating steps (a)-(d) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to identify at least one other cluster of documents, wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
12. An apparatus for identifying clusters of similar documents from among a set of documents, comprising:
-
a memory; and
a processor coupled to the memory, wherein the processor is configured to execute the steps of;
(a) selecting a particular document from among available documents of the set of documents;
(b) generating a probe based on the particular document, the probe comprising one or more features;
(c) finding documents that satisfy a similarity condition using the probe from among the available documents;
(d) associating some or all of the documents that satisfy the similarity condition with a particular cluster of documents;
(e) repeating steps (a)-(d) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to identify at least one other cluster of documents, wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents.
-
Specification