×

Automatic incremental labeling of document clusters

  • US 9,002,848 B1
  • Filed: 06/22/2012
  • Issued: 04/07/2015
  • Est. Priority Date: 12/27/2011
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method comprising:

  • assembling a set of documents, the set of documents including a first plurality of previously clustered documents and a second plurality of documents, each of the first plurality of previously clustered documents having at least one label identifying a topic to which content of the document relates;

    partitioning, by a non-transitory computing device, documents from the set of documents into multiple clusters;

    determining, by the non-transitory computing device, that a dominant topic exists within a first cluster of said multiple clusters;

    determining, by the computing device, (i) a purity score representing a first ratio of a number of documents having a label identifying the dominant topic in the first cluster to a total number of previously clustered documents within the first cluster and (ii) a confidence measure representing a second ratio of the total number of previously clustered documents in the first cluster to a size of the first cluster, wherein the size of the first cluster equals a total number od documents included within the first cluster; and

    labeling, by the computing device, at least documents from the second plurality of documents within said one of the multiple clusters with the label identifying the dominant topic when the purity score exceeds a first predetermined threshold and the confidence score exceeds a second predetermined threshold.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×