×

Thematic clustering

  • US 8,886,651 B1
  • Filed: 12/22/2011
  • Issued: 11/11/2014
  • Est. Priority Date: 12/22/2011
  • Status: Active Grant
First Claim
Patent Images

1. A system, comprising:

  • a processor configured to;

    cluster a data set into one or more initial clusters using a first term space comprising a plurality of keywords;

    determine an initial theme for each initial cluster, wherein the initial theme for each initial cluster is determined based on at least one keyword in the first term space;

    reduce the first term space to create a reduced term space, wherein reducing the first term space includes removing from the first term space a keyword term that is determined to be present in a first document clustered into a first initial cluster and is also determined to be present in a second document clustered into a second initial cluster, and wherein a term frequency for the keyword at least meets a predetermined threshold;

    recluster at least a portion of the data set into one or more baby clusters using the reduced term space, wherein after reclustering, at least one singleton is present, wherein a singleton is an element from the data set that was not assigned to any baby clusters during the reclustering;

    assign at least one singleton to a baby cluster to form one or more renovated clusters;

    determine a renovated theme for at least one of the renovated clusters; and

    provide as output one or more of the renovated clusters and their respective themes; and

    a memory coupled to the processor and configured to provide the processor with instructions.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×