×

Computer-implemented system and method for clustering similar documents

  • US 8,725,736 B2
  • Filed: 02/14/2013
  • Issued: 05/13/2014
  • Est. Priority Date: 08/31/2001
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented system for clustering similar documents, comprising:

  • concepts for a set of documents;

    an occurrence module to determine occurrence frequencies of each concept in the document set;

    a distance module to calculate an inner product quantifying a similarity for each of the documents in the set with one or more clusters of documents based on the occurrence frequencies of the concepts;

    a map module to map each document to each of the document clusters based on the inner product, to identify those documents with the smallest inner products as most relevant to a theme, and to generate a matrix as a representation of the document and cluster mappings; and

    a processor to execute the modules.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×