×

Method of analyzing documents

  • US 20060259481A1
  • Filed: 05/12/2005
  • Published: 11/16/2006
  • Est. Priority Date: 05/12/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method of analyzing a plurality of documents, comprising:

  • collecting and filtering terms from a plurality of documents;

    identifying a term-frequency vector for each of the documents;

    identifying a term-frequency matrix, wherein rows of the matrix comprise values for the term-frequency vectors;

    projecting the term-frequency matrix onto a lower dimensional space using latent semantic analysis, to create a transformed term matrix;

    developing a correlation matrix using the rows of the transformed term matrix;

    creating a concept graph of connected components using a concept threshold, where each connected component is a set of terms that corresponds to a concept; and

    clustering documents that contain concept term sets together.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×