Methods and apparatus for interactive document clustering
First Claim
1. A computerized method for forming clusters of documents from among a set of documents, the method comprising:
- (a) identifying a plurality of seed candidate documents;
(b) generating candidate probes based upon the seed candidate documents, the candidate probes each comprising one or more features from the seed candidate documents;
(c) displaying information regarding the candidate probes to a user;
(d) receiving user input regarding the candidate probes and defining a set of probes from which to form clusters of documents based upon the user input regarding the candidate probes;
(e) selecting a probe and forming a cluster of documents from among available documents of the set of documents using the probe, wherein forming the cluster of documents comprises finding documents that satisfy a similarity condition relative to the probe and associating some or all of the documents that satisfy the similarity condition with a particular cluster of documents; and
(f) repeating step (e) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to form at least one other cluster of documents,wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-based process is described for identifying clusters of documents that have some degree of similarity from among a set of documents that permits user interaction with the process. A plurality of seed candidate documents is identified. Candidate probes based upon the seed candidate documents are generated, and information regarding the candidate probes is displayed to a user. User input regarding the candidate probes is received, and a set of probes from which to form clusters of documents are defined based upon the user input regarding the candidate probes. A probe is selected and a cluster of documents is formed from among available documents not yet clustered using the probe. The process can be repeated to generate further clusters. The process can be implemented with a computer system, and associated programming instructions can be contained within a computer readable medium.
-
Citations
21 Claims
-
1. A computerized method for forming clusters of documents from among a set of documents, the method comprising:
-
(a) identifying a plurality of seed candidate documents; (b) generating candidate probes based upon the seed candidate documents, the candidate probes each comprising one or more features from the seed candidate documents; (c) displaying information regarding the candidate probes to a user; (d) receiving user input regarding the candidate probes and defining a set of probes from which to form clusters of documents based upon the user input regarding the candidate probes; (e) selecting a probe and forming a cluster of documents from among available documents of the set of documents using the probe, wherein forming the cluster of documents comprises finding documents that satisfy a similarity condition relative to the probe and associating some or all of the documents that satisfy the similarity condition with a particular cluster of documents; and (f) repeating step (e) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to form at least one other cluster of documents, wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus for identifying clusters of documents from among a set of documents, comprising:
-
a memory; and a processing system coupled to the memory, wherein the processing system is configured to; (a) identify a plurality of seed candidate documents; (b) generate candidate probes based upon the seed candidate documents, the candidate probes each comprising one or more features from the seed candidate documents; (c) display information regarding the candidate probes to a user; (d) receive user input regarding the candidate probes and defining a set of probes from which to form clusters of documents based upon the user input regarding the candidate probes; (e) select a probe and forming a cluster of documents from among available documents of the set of documents using the probe, wherein forming the cluster of documents comprises finding documents that satisfy a similarity condition relative to the probe and associating some or all of the documents that satisfy the similarity condition with a particular cluster of documents; and (f) repeat step (e) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to form at least one other cluster of documents, wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer readable medium comprising processing instructions for identifying clusters of documents from among a set of documents, wherein the processing instructions cause a processing system to:
-
(a) identify a plurality of seed candidate documents; (b) generate candidate probes based upon the seed candidate documents, the candidate probes each comprising one or more features from the seed candidate documents; (c) display information regarding the candidate probes to a user; (d) receive user input regarding the candidate probes and defining a set of probes from which to form clusters of documents based upon the user input regarding the candidate probes; (e) select a probe and forming a cluster of documents from among available documents of the set of documents using the probe, wherein forming the cluster of documents comprises finding documents that satisfy a similarity condition relative to the probe and associating some or all of the documents that satisfy the similarity condition with a particular cluster of documents; and (f) repeat step (e) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to form at least one other cluster of documents, wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification