×

System And Method For Scoring Concepts In A Document Set

  • US 20100049708A1
  • Filed: 10/26/2009
  • Published: 02/25/2010
  • Est. Priority Date: 07/25/2003
  • Status: Active Grant
First Claim
Patent Images

1. A system for scoring concepts in a document set, comprising:

  • a database to maintain a set of documents;

    a concept identification module to identify concepts comprising two or more terms extracted from the document set and to designate each document having one or more of the concepts as a candidate seed document;

    a scoring module to calculate a score for each of the concepts identified within each candidate seed document based on a frequency of occurrence, concept weight, structural weight, and corpus weight;

    a vector module to form a vector for each candidate seed document comprising the concepts located in that candidate seed document and the associated concept scores;

    a document comparison module to compare the vector for each candidate seed document with a center of one or more clusters each comprising thematically-related documents and to select at least one of the candidate seed documents that is sufficiently distinct from the other candidate seed documents as a seed document for a new cluster; and

    a clustering module to place each of the unselected candidate seed documents into one of the clusters having a most similar cluster center.

View all claims
  • 11 Assignments
Timeline View
Assignment View
    ×
    ×