Parallel document clustering process

US 5,864,855 A
Filed: 02/26/1996
Issued: 01/26/1999
Est. Priority Date: 02/26/1996
Status: Expired due to Fees

First Claim

Patent Images

1. In an arrangement of parallel processors in a computer information processing system, a parallel clustering method for examining preselected documents and grouping similar documents in the parallel processors for subsequent retrieval in an electronic digital format from the computer information processing system, the steps comprising:

converting each preselected document into an electronic document in digital format;

converting each electronic document into a vector, whereby a vector is a weighted list of the occurence of different words and terms that appear in the document;

selecting a first electronic document and designating the vector of the first electronic document as a first cluster vector whereby a cluster vector is the mathematical average of all of the document vectors having similar characteristics, and assigning the first cluster vector to a first processor of the parallel processors;

selecting a second electronic document and comparing the vector of the second electronic document with the first cluster vector to determine if the second document vector has similar characteristics, and assigning the second document vector to the first cluster vector if they have similar characteristics or designating the second document vector as a second cluster vector and assigning the second cluster vector to a second processor of the parallel processors if there are different characteristics; and

selecting each subsequent electronic document and comparing the vector of each subsequent electronic document with all existing cluster vectors simultaneously on each processor having a cluster vector, and assigning each subsequent document vector to a parallel processor having the most similar characteristics or designating the subsequent document vector as a subsequent cluster vector and assigning the subsequent cluster vector to a processor of the parallel processors if there are different characteristics.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer information processing system utilizes parallel processors for organizing and clustering a large number of documents into a large number of clusters for information analysis and retrieval. After the documents are translated into electronic digital documents, each document is converted into a vector based on weighted list of the occurence of different words and terms that appear in the document. The document vectors are grouped together into cluster vectors on different parallel processors according to similarities. New document vectors are simultaneously compared with existing cluster vectors in the different parallel processors.

Citations

1 Claim

1. In an arrangement of parallel processors in a computer information processing system, a parallel clustering method for examining preselected documents and grouping similar documents in the parallel processors for subsequent retrieval in an electronic digital format from the computer information processing system, the steps comprising:
- converting each preselected document into an electronic document in digital format;
  
  converting each electronic document into a vector, whereby a vector is a weighted list of the occurence of different words and terms that appear in the document;
  
  selecting a first electronic document and designating the vector of the first electronic document as a first cluster vector whereby a cluster vector is the mathematical average of all of the document vectors having similar characteristics, and assigning the first cluster vector to a first processor of the parallel processors;
  
  selecting a second electronic document and comparing the vector of the second electronic document with the first cluster vector to determine if the second document vector has similar characteristics, and assigning the second document vector to the first cluster vector if they have similar characteristics or designating the second document vector as a second cluster vector and assigning the second cluster vector to a second processor of the parallel processors if there are different characteristics; and
  
  selecting each subsequent electronic document and comparing the vector of each subsequent electronic document with all existing cluster vectors simultaneously on each processor having a cluster vector, and assigning each subsequent document vector to a parallel processor having the most similar characteristics or designating the subsequent document vector as a subsequent cluster vector and assigning the subsequent cluster vector to a processor of the parallel processors if there are different characteristics.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The United States Of America As Represented By The Secretary Of The Army
Original Assignee
The United States Of America As Represented By The Secretary Of The Army
Inventors
Frieder, Ophir, Ruocco, Anthony S.
Primary Examiner(s)
Black, Thomas G.
Assistant Examiner(s)
ROBINSON, GRETA LEE

Application Number

US08/606,951
Time in Patent Office

1,065 Days
Field of Search

395/611, 395/602, 395/605, 707/100, 707/2, 707/5, 707/10
US Class Current

1/1
CPC Class Codes

G06F 16/355   Class or cluster creation o...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99936   Pattern matching access

Parallel document clustering process

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

1 Claim

Specification

Solutions

Use Cases

Quick Links

Parallel document clustering process

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

1 Claim

Specification

Subscription Required

Solutions

Use Cases

Quick Links