Method and system for optimally searching a document database using a representative semantic space

  • US 6,847,966 B1
  • Filed: 04/24/2002
  • Issued: 01/25/2005
  • Est. Priority Date: 04/24/2002
  • Status: Active Grant
First Claim
Patent Images

1. A method for comparing documents for similarity by representative latent semantic analysis comprising:

  • generating a weighted term-by-document matrix for a group of selected documents, said group of selected documents being representative of a similarity criteria;

    decomposing said weighted term-by-document matrix into a term matrix of terms occurring said group of selected documents and a reduced concepts matrix of concepts, at least a portion of said concepts indicative of latent semantics in at least one of said selected documents;

    generating a first pseudo-document vector utilizing a first document and said reduced concepts matrix;

    generating a second pseudo-document vector utilizing a second document and said reduced concepts matrix;

    comparing said first pseudo-document vector to said second pseudo-document vector; and

    determining similarity of said first document and said second document based on the comparison of said first pseudo-document vector to said second pseudo-document vector.

View all claims
    ×
    ×

    Thank you for your feedback

    ×
    ×