Agent-based method for distributed clustering of textual information

US 7,805,446 B2
Filed: 10/12/2004
Issued: 09/28/2010
Est. Priority Date: 10/12/2004
Status: Active Grant

First Claim

Patent Images

1. A computer method for storing information in a computer system having at least first and second computers for retrieval and display based on similarity of information, the method comprising:

a first-tier program module operating on a first computer for determining a new document vector to characterize a new document for comparison of a similarity of the new document to other documents stored in the computer system;

the first-tier program module transmitting the new document to a second-tier program module operating on a second computer in the computer system;

wherein the second-tier program module transmits the document vector to a plurality of third-tier program modules operating on the second computer in the computer system;

the third-tier program modules each storing a composite vector representing the similarity of a respective plurality of documents stored in the second computer under control of the respective third-tier program module; and

the third-tier program modules each receiving the document vector for the new document and comparing the document vector to a respective composite vector to determine similarity of the new document to the plurality of documents stored in the second computer under the control of the third-tier program module; and

the third-tier program modules each returning to the second-tier module a similarity value resulting from comparison of the new document vector to a respective composite vector, the second-tier module returning a best match similarity value to the first-tier module representing a greatest measure of similarity of the new document to a respective plurality of documents stored under control of a respective third-tier program module to determine routing of the document to a selected second-tier program module from among a plurality of second-tier program modules.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer method and system for storing, retrieving and displaying information has a multiplexing agent (20) that calculates a new document vector (25) for a new document (21) to be added to the system and transmits the new document vector (25) to master cluster agents (22) and cluster agents (23) for evaluation. These agents (22, 23) perform the evaluation and return values upstream to the multiplexing agent (20) based on the similarity of the document to documents stored under their control. The multiplexing agent (20) then sends the document (21) and the document vector (25) to the master cluster agent (22), which then forwards it to a cluster agent (23) or creates a new cluster agent (23) to manage the document (21). The system also searches for stored documents according to a search query having at least one term and identifying the documents found in the search, and displays the documents in a clustering display (80) of similarity so as to indicate similarity of the documents to each other.

Citations

25 Claims

1. A computer method for storing information in a computer system having at least first and second computers for retrieval and display based on similarity of information, the method comprising:
- a first-tier program module operating on a first computer for determining a new document vector to characterize a new document for comparison of a similarity of the new document to other documents stored in the computer system;
  
  the first-tier program module transmitting the new document to a second-tier program module operating on a second computer in the computer system;
  
  wherein the second-tier program module transmits the document vector to a plurality of third-tier program modules operating on the second computer in the computer system;
  
  the third-tier program modules each storing a composite vector representing the similarity of a respective plurality of documents stored in the second computer under control of the respective third-tier program module; and
  
  the third-tier program modules each receiving the document vector for the new document and comparing the document vector to a respective composite vector to determine similarity of the new document to the plurality of documents stored in the second computer under the control of the third-tier program module; and
  
  the third-tier program modules each returning to the second-tier module a similarity value resulting from comparison of the new document vector to a respective composite vector, the second-tier module returning a best match similarity value to the first-tier module representing a greatest measure of similarity of the new document to a respective plurality of documents stored under control of a respective third-tier program module to determine routing of the document to a selected second-tier program module from among a plurality of second-tier program modules.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 22, 23)
- - 2. The computer method of claim 1, whereinsaid first-tier program module transmits said new document to the second-tier program module for further transmission to a selected one of said third-tier program modules;
    - andwherein upon receiving said new document, said third-tier program module transmits the new document vector to at least one fourth-tier program module;
      
      wherein the fourth-tier program modules compares the new document vector to a respective composite vector to determine similarity of the new document to the plurality of documents stored in the second computer under the control of the fourth-tier program module;
      
      wherein the fourth-tier program modules transmits a similarity value to the third-tier program module; and
      
      based on evaluation of the similarity value returned by said fourth-tier program module, said third-tier program module determines whether the new document is to be transmitted to the fourth-tier program module.
  - 3. The method of claim 1, whereinbased on evaluation of the similarity value returned by the second-tier program module, said first-tier program module transmits the new document to the second-tier program module for storage under control of an existing third-tier and an existing fourth-tier program module, which are in communication with the second-tier program module.
  - 4. The method of claim 1, whereinbased on evaluation of the similarity value returned by the second-tier program module, said first-tier program module transmits the new document to the second-tier program module for storage under control of a third-tier program module and a fourth-tier program module, which are to be created by the second-tier program module.
  - 5. The method of claim 1, whereineach new document has been converted to a data file using a tag-identifier language before determining the document vector.
  - 6. The method of claim 1, 2, 3, 4, or 5, further comprisingsearching for stored documents according to a search query having at least one term and identifying the documents found in the search;
    - anddisplaying the documents so as to indicate similarity of the documents to each other.
  - 7. The method of claim 6, wherein the documents are displayed as nodes of a tree structure having links and nodes in which similarity of documents is indicated by proximity of nodes to each other and by a length of links connecting the nodes to a common vertex.
  - 8. The method of claim 7, wherein the documents are displayed as nodes of a tree structure that represents documents stored under control of at least one third-tier program module.
  - 9. The method of claim 1, 2, 3, 4, or 5, wherein the information, including the new document, is collected from a plurality of Internet web sites.
  - 10. The method of claim 1, wherein the first-tier program module and the second-tier program module are agent program modules that are originated in JAVA computer language.
  - 11. The method of claim 1, further comprising:
    - the first-tier program module transmitting the new document vector to a second, second-tier program module operating on a third computer in the computer system;
      
      the second, second-tier program module transmitting a similarity value to said first-tier program module which represents a comparison of the new document vector to at least one composite vector characterizing a similarity of a plurality of documents stored in the third computer; and
      
      based on said similarity value received from each of said second-tier program modules, said first-tier program module determining whether said new document should be transmitted to a selected one of said second-tier program modules for storage in the computer system.
  - 22. The method of claim 1, wherein said determining the new document vector further comprises determining the words in the new document, determining the frequency of the words, and executing statistical computations concerning word frequency.
  - 23. The method of claim 1, wherein said new document vector is not based on a pre-defined theme for any of the documents, wherein said composite vector is not based on any theme for any of the documents and wherein said new document vector is compared to said composite vector to determine routing of a document for storage.

12. A computer system for storing, retrieving and displaying information, the computer system being operable on at least one computer having a software operating system, the computer system comprising:
- a first-tier, multiplexing program module running on a first computer for receiving a new document originating from an information source and for calculating a new document vector for the new document, and for transmitting said new document vector to at least one second-tier program module; and
  
  a second-tier program module running on a second computer;
  
  wherein the second-tier program module transmits the document vector to a plurality of third-tier program modules operating on the second computer in the computer system;
  
  the third-tier program modules each storing a composite vector representing the similarity of a respective plurality of documents stored in the second computer under the control of the respective third-tier program module; and
  
  the third-tier program modules each receiving the document vector for the new document and comparing the document vector to a respective composite vector to determine similarity of the new document to the plurality of documents stored in the second computer under control of the third-tier program module;
  
  the third-tier program module each returning a similarity value to the second-tier module resulting from comparison of the new document vector to a respective composite vector, the second-tier module returning a best match similarity value to the first-tier, multiplexing program module representing a greatest measure of similarity of the new document to a respective plurality of documents stored under control of a respective third-tier program module to determine routing of the document to a selected second-tier program module from among a plurality of second-tier program modules.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 24, 25)
- - 13. The computer system of claim 12, whereinsaid multiplexing program module transmits said new document to the second-tier program module for further transmission to a selected one of said third-tier program modules;
    - andwherein upon receiving said new document, said third-tier program module transmits the new document vector to at least one fourth-tier program module;
      
      wherein the fourth-tier program module compares the new document vector to a respective composite vector to determine similarity of the new document to the plurality of documents stored in the second computer under the control of the fourth-tier program module;
      
      wherein the fourth-tier program module transmits a similarity value to the third-tier program module; and
      
      based on evaluation of the similarity value returned by said fourth-tier program module, said third-tier program module determines whether the new document is to be transmitted to the fourth-tier program module.
  - 14. The computer system of claim 12, whereinbased on evaluation of the similarity value returned by the second-tier program module, said first-tier, multiplexing program module transmits the new document to the second-tier program module for storage under control of an existing third-tier and an existing fourth-tier program module, which are in communication with the second-tier program module.
  - 15. The computer system of claim 12, whereinbased on evaluation of the similarity value returned by the second-tier program module, said multiplexing program module transmits the new document to the second-tier program module for storage under control of a third-tier program module and a fourth-tier program module, which are provided by the second-tier program module.
  - 16. The computer system of claim 12, further comprising:
    - a second, second-tier program module running on a third computer for receiving the new document from the first-mentioned second-tier program module; and
      
      wherein said first-mentioned second-tier program module running on said second computer and said second, second-tier program module running on said third computer each compare the new document vector to a respective composite vector to determine a similarity value representing the similarity of the new document to the documents already stored in the second computer and the third computer, respectively, and wherein each of said second-tier program modules communicate said respective similarity value to said multiplexing program module for a determination in which of the second computer and the third computer the document should be stored for access.
  - 17. The computer system of claim 12, 13, 14, or 15 further comprising:
    - a computer display and a display program module for displaying the documents so as to indicate similarity of the documents to each other.
  - 18. The computer system of claim 17, wherein the documents are displayed by the display program module as nodes of a tree structure having links and nodes in which similarity of documents is indicated by proximity of nodes to each other and by a length of links connecting the nodes to a common vertex.
  - 19. The computer system of claim 18, wherein the documents are displayed by the display program module as nodes of a tree structure that represents documents stored under control of at least one third-tier program module.
  - 20. The computer system of claim 12, 13, 14,or 15 wherein the information, including the new document, is collected from a plurality of Internet web sites.
  - 21. The computer system of claim 12, wherein the multiplexing program module and the second-tier program modules are agent program modules originated in JAVA computer language.
  - 24. The computer system of claim 12, wherein said new document vector is based on words in the new document, frequency of the words in the new document, and statistical computations concerning word frequency.
  - 25. The computer system of claim 12, wherein said new document vector is not based on a pre-defined theme for any of the documents and wherein said composite vector is not based on any theme for any of the documents and wherein said new document vector is compared to said composite vector to determine routing of a document for storage.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
UT-Battelle LLC
Original Assignee
UT-Battelle LLC
Inventors
Reed, Joel W., Elmore, Mark T., Potok, Thomas E., Treadwell, Jim N.
Primary Examiner(s)
Wassum; Luke S.
Assistant Examiner(s)
Badawi; Sherief

Application Number

US10/963,241
Publication Number

US 20060080311A1
Time in Patent Office

2,177 Days
Field of Search

707/7, 707/102, 707/737, 707/776, 707/777, 707/778
US Class Current

707/737
CPC Class Codes

G06F 16/355 Class or cluster creation o...

Agent-based method for distributed clustering of textual information

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Agent-based method for distributed clustering of textual information

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links