Methods and apparatus for interactive document clustering

US 20090287668A1
Filed: 05/16/2008
Published: 11/19/2009
Est. Priority Date: 05/16/2008
Status: Abandoned Application

First Claim

Patent Images

1. A computerized method for forming clusters of documents from among a set of documents, the method comprising:

(a) identifying a plurality of seed candidate documents;

(b) generating candidate probes based upon the seed candidate documents, the candidate probes each comprising one or more features from the seed candidate documents;

(c) displaying information regarding the candidate probes to a user;

(d) receiving user input regarding the candidate probes and defining a set of probes from which to form clusters of documents based upon the user input regarding the candidate probes;

(e) selecting a probe and forming a cluster of documents from among available documents of the set of documents using the probe, wherein forming the cluster of documents comprises finding documents that satisfy a similarity condition relative to the probe and associating some or all of the documents that satisfy the similarity condition with a particular cluster of documents; and

(f) repeating step (e) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to form at least one other cluster of documents,wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-based process is described for identifying clusters of documents that have some degree of similarity from among a set of documents that permits user interaction with the process. A plurality of seed candidate documents is identified. Candidate probes based upon the seed candidate documents are generated, and information regarding the candidate probes is displayed to a user. User input regarding the candidate probes is received, and a set of probes from which to form clusters of documents are defined based upon the user input regarding the candidate probes. A probe is selected and a cluster of documents is formed from among available documents not yet clustered using the probe. The process can be repeated to generate further clusters. The process can be implemented with a computer system, and associated programming instructions can be contained within a computer readable medium.

Citations

21 Claims

1. A computerized method for forming clusters of documents from among a set of documents, the method comprising:
- (a) identifying a plurality of seed candidate documents;
  
  (b) generating candidate probes based upon the seed candidate documents, the candidate probes each comprising one or more features from the seed candidate documents;
  
  (c) displaying information regarding the candidate probes to a user;
  
  (d) receiving user input regarding the candidate probes and defining a set of probes from which to form clusters of documents based upon the user input regarding the candidate probes;
  
  (e) selecting a probe and forming a cluster of documents from among available documents of the set of documents using the probe, wherein forming the cluster of documents comprises finding documents that satisfy a similarity condition relative to the probe and associating some or all of the documents that satisfy the similarity condition with a particular cluster of documents; and
  
  (f) repeating step (e) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to form at least one other cluster of documents,wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, comprising:
    - receiving a user command for user interaction regarding forming clusters of documents;
      
      displaying clustering results to the user.
  - 3. The method of claim 2, comprising:
    - receiving a user command to reject a cluster of documents that was formed; and
      
      releasing the documents of the rejected cluster back to the set of available documents.
  - 4. The method of claim 2, comprising:
    - receiving a user command to define an additional probe for further cluster formation after receiving the command for user interaction; and
      
      forming a cluster of documents from among the available documents using the additional probe.
  - 5. The method of claim 2, wherein the user command for user interaction is received prior to satisfying the halting condition.
  - 6. The method of claim 2, wherein the user command for user interaction is received after satisfying the halting condition.
  - 7. The method of claim 1, wherein identifying a plurality of seed candidate documents is carried out utilizing user input regarding the plurality of seed candidate documents.

8. An apparatus for identifying clusters of documents from among a set of documents, comprising:
- a memory; and
  
  a processing system coupled to the memory, wherein the processing system is configured to;
  
  (a) identify a plurality of seed candidate documents;
  
  (b) generate candidate probes based upon the seed candidate documents, the candidate probes each comprising one or more features from the seed candidate documents;
  
  (c) display information regarding the candidate probes to a user;
  
  (d) receive user input regarding the candidate probes and defining a set of probes from which to form clusters of documents based upon the user input regarding the candidate probes;
  
  (e) select a probe and forming a cluster of documents from among available documents of the set of documents using the probe, wherein forming the cluster of documents comprises finding documents that satisfy a similarity condition relative to the probe and associating some or all of the documents that satisfy the similarity condition with a particular cluster of documents; and
  
  (f) repeat step (e) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to form at least one other cluster of documents,wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The apparatus of claim 8, wherein the processing system is configured to:
    - receive a user command for user interaction regarding forming clusters of documents; and
      
      display clustering results to the user.
  - 10. The apparatus of claim 9, wherein the processing system is configured to:
    - receive a user command to reject a cluster of documents that was formed; and
      
      release the documents of the rejected cluster back to the set of available documents.
  - 11. The apparatus of claim 9, wherein the processing system is configured to:
    - receive a user command to define an additional probe for further cluster formation after receiving the command for user interaction; and
      
      form a cluster of documents from among the available documents using the additional probe.
  - 12. The apparatus of claim 9, wherein the user command for user interaction is received prior to satisfying the halting condition.
  - 13. The apparatus of claim 9, wherein the user command for user interaction is received after satisfying the halting condition.
  - 14. The apparatus of claim 8, wherein the processing system is configured to identify a plurality of seed candidate documents utilizing user input regarding the plurality of seed candidate documents.

15. A computer readable medium comprising processing instructions for identifying clusters of documents from among a set of documents, wherein the processing instructions cause a processing system to:
- (a) identify a plurality of seed candidate documents;
  
  (b) generate candidate probes based upon the seed candidate documents, the candidate probes each comprising one or more features from the seed candidate documents;
  
  (c) display information regarding the candidate probes to a user;
  
  (d) receive user input regarding the candidate probes and defining a set of probes from which to form clusters of documents based upon the user input regarding the candidate probes;
  
  (e) select a probe and forming a cluster of documents from among available documents of the set of documents using the probe, wherein forming the cluster of documents comprises finding documents that satisfy a similarity condition relative to the probe and associating some or all of the documents that satisfy the similarity condition with a particular cluster of documents; and
  
  (f) repeat step (e) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to form at least one other cluster of documents,wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The computer readable medium of claim 15, wherein the computer readable medium comprises processing instructions that cause a processing system to:
    - receive a user command for user interaction regarding forming clusters of documents; and
      
      display clustering results to the user.
  - 17. The computer readable medium of claim 16, wherein the computer readable medium comprises processing instructions that cause a processing system to:
    - receive a user command to reject a cluster of documents that was formed; and
      
      release the documents of the rejected cluster back to the set of available documents.
  - 18. The computer readable medium of claim 16, wherein the computer readable medium comprises processing instructions that cause a processing system to:
    - receive a user command to define an additional probe for further cluster formation after receiving the command for user interaction; and
      
      form a cluster of documents from among the available documents using the additional probe.
  - 19. The computer readable medium of claim 16, wherein the user command for user interaction is received prior to satisfying the halting condition.
  - 20. The computer readable medium of claim 16, wherein the user command for user interaction is received after satisfying the halting condition.
  - 21. The computer readable medium of claim 15, wherein the computer readable medium comprises processing instructions that cause a processing system to identify a plurality of seed candidate documents utilizing user input regarding the plurality of seed candidate documents.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Justsystems Evans Research Incorporated
Original Assignee
Justsystems Evans Research Incorporated
Inventors
Bennett, Jeffrey, Sheftel, Victor M., Evans, David A.

Application Number

US12/153,331
Publication Number

US 20090287668A1
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/355 Class or cluster creation o...

Methods and apparatus for interactive document clustering

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and apparatus for interactive document clustering

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links