×

Parallel document clustering process

  • US 5,864,855 A
  • Filed: 02/26/1996
  • Issued: 01/26/1999
  • Est. Priority Date: 02/26/1996
  • Status: Expired due to Fees
First Claim
Patent Images

1. In an arrangement of parallel processors in a computer information processing system, a parallel clustering method for examining preselected documents and grouping similar documents in the parallel processors for subsequent retrieval in an electronic digital format from the computer information processing system, the steps comprising:

  • converting each preselected document into an electronic document in digital format;

    converting each electronic document into a vector, whereby a vector is a weighted list of the occurence of different words and terms that appear in the document;

    selecting a first electronic document and designating the vector of the first electronic document as a first cluster vector whereby a cluster vector is the mathematical average of all of the document vectors having similar characteristics, and assigning the first cluster vector to a first processor of the parallel processors;

    selecting a second electronic document and comparing the vector of the second electronic document with the first cluster vector to determine if the second document vector has similar characteristics, and assigning the second document vector to the first cluster vector if they have similar characteristics or designating the second document vector as a second cluster vector and assigning the second cluster vector to a second processor of the parallel processors if there are different characteristics; and

    selecting each subsequent electronic document and comparing the vector of each subsequent electronic document with all existing cluster vectors simultaneously on each processor having a cluster vector, and assigning each subsequent document vector to a parallel processor having the most similar characteristics or designating the subsequent document vector as a subsequent cluster vector and assigning the subsequent cluster vector to a processor of the parallel processors if there are different characteristics.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×