Methods and apparatus for rank-based response set clustering

US 20070112867A1
Filed: 11/15/2005
Published: 05/17/2007
Est. Priority Date: 11/15/2005
Status: Abandoned Application

First Claim

Patent Images

1. A method for identifying clusters of similar documents from among a set of documents, the method comprising:

(a) selecting a particular document based on rank from among a ranked set of documents;

(b) generating a probe based on the particular document, the probe comprising one or more features;

(c) finding documents that satisfy a similarity condition from among available documents of the set of documents using a search based upon the probe;

(d) associating some or all documents found with a particular cluster of documents; and

(e) repeating steps (a)-(d) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to identify at least one other cluster of documents, wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for identifying clusters of similar documents from among a set of documents is described. A particular document is selected based on rank from among a ranked set of documents, wherein the ranked set of documents are included among available documents of the set of documents. A probe is generated based on the particular document. The probe comprising one or more features. Documents that satisfy a similarity condition are found from among the available documents using a search based upon the probe. Some or all documents found are associated with a particular cluster of documents. The process can be repeated to generate further clusters. The method can be implemented with a computer, and associated programming instructions can be contained within a compute readable carrier.

41 Citations

View as Search Results

25 Claims

1. A method for identifying clusters of similar documents from among a set of documents, the method comprising:
- (a) selecting a particular document based on rank from among a ranked set of documents;
  
  (b) generating a probe based on the particular document, the probe comprising one or more features;
  
  (c) finding documents that satisfy a similarity condition from among available documents of the set of documents using a search based upon the probe;
  
  (d) associating some or all documents found with a particular cluster of documents; and
  
  (e) repeating steps (a)-(d) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to identify at least one other cluster of documents, wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 25)
- - 2. The method of claim 1, wherein selecting the particular document based on rank comprises selecting the highest ranked document of the ranked set of documents.
  - 3. The method of claim 1, wherein generating the probe based on the particular document comprises generating the probe based on the particular document and based on a feature vector used to generate the ranked set of documents.
  - 4. The method of claim 1, comprising generating an additional probe based on said probe and based on a feature vector used to generate the ranked set of documents, such that finding documents in step (c) is based upon said probe and said additional probe.
  - 5. The method of claim 1, further comprising:
    - generating a new probe based on a subset of the documents found at step (c); and
      
      finding documents from among the available documents using a search based upon the new probe, wherein the associating in step (d) is based on documents found using the search based upon the new probe.
  - 6. The method of claim 1, wherein said another similarity condition is the same as the similarity condition.
  - 7. The method of claim 1, wherein the probe comprises the particular document.
  - 8. The method of claim 1, wherein the probe comprises a subset of features selected from the particular document.
  - 9. The method of claim 1, wherein the probe comprises a subset of features selected from multiple documents of the set of documents, and wherein the subset of features includes features of the particular document.
  - 10. The method of claim 1, comprising ranking the documents of said particular cluster and ranking the documents of said at least one other cluster.
  - 11. The method of claim 1, comprising generating an identifier using the probe that describes content of the particular cluster of documents.
  - 12. The method of claim 1, comprising refining the probe by reforming the probe using at least one new document from the set of documents.
  - 25. A computer readable carrier comprising processing instructions adapted to cause a processor to execute the method of claim 1.

13. An apparatus for identifying clusters of similar documents from among a set of documents, comprising:
- a memory; and
  
  a processor coupled to the memory, wherein the processor is configured to execute the steps of;
  
  (a) selecting a particular document based on rank from among a ranked set of documents;
  
  (b) generating a probe based on the particular document, the probe comprising one or more features;
  
  (c) finding documents that satisfy a similarity condition from among available documents of the set of documents using a search based upon the probe;
  
  (d) associating some or all documents found with a particular cluster of documents; and
  
  (e) repeating steps (a)-(d) using another probe as the probe and using another similarity condition as the similarity condition until a halting condition is satisfied to identify at least one other cluster of documents, wherein those documents of the set of documents previously associated with a cluster of documents are not included among the available documents.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 14. The apparatus of claim 13, wherein selecting the particular document based on rank comprises selecting the highest ranked document of the ranked set of documents.
  - 15. The apparatus of claim 13, wherein generating the probe based on the particular document comprises generating the probe based on the particular document and based on a feature vector used to generate the ranked set of documents.
  - 16. The apparatus of claim 13, comprising generating an additional probe based on said probe and based on a feature vector used to generate the ranked set of documents, such that finding documents in step (c) is based upon said probe and said additional probe.
  - 17. The apparatus of claim 13, further comprising:
    - generating a new probe based on a subset of the documents found at step (c); and
      
      finding documents from among the available documents using a search based upon the new probe, wherein the associating in step (d) is based on documents found using the search based upon the new probe.
  - 18. The apparatus of claim 13, wherein said another similarity condition is the same as the similarity condition.
  - 19. The apparatus of claim 13, wherein the probe comprises the particular document.
  - 20. The apparatus of claim 13, wherein the probe comprises a subset of features selected from the particular document.
  - 21. The apparatus of claim 13, wherein the probe comprises a subset of features selected from multiple documents of the set of documents, and wherein the subset of features includes features of the particular document.
  - 22. The apparatus of claim 13, comprising ranking the documents of said particular cluster and ranking the documents of said at least one other cluster.
  - 23. The apparatus of claim 13, comprising generating an identifier using the probe that describes content of the particular cluster of documents.
  - 24. The apparatus of claim 13, comprising refining the probe by reforming the probe using at least one new document from the set of documents.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Justsystems Evans Research Incorporated
Original Assignee
Clairvoyance Corporation
Inventors
Evans, David, Bennett, Jeffrey, Sheftel, Victor, Hull, David

Application Number

US11/272,784
Publication Number

US 20070112867A1
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/355 Class or cluster creation o...

Methods and apparatus for rank-based response set clustering

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

41 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and apparatus for rank-based response set clustering

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

41 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links