×

System and method for adaptive categorization for use with dynamic taxonomies

  • US 8,161,028 B2
  • Filed: 12/05/2008
  • Issued: 04/17/2012
  • Est. Priority Date: 12/05/2008
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for categorizing data points belonging to a data set, said method comprising:

  • matching a textual description of a data point of said data set to category descriptions relating to one or more pre-defined set of categories;

    for said data point having said textual description, generating, using a processor device, one or more preliminary soft seed labels corresponding to one of the one or more pre-defined set of categories and corresponding seed score based on a result of said matching; and

    assigning each of said data points into a predefined number of clusters corresponding to the one or more predefined set of categories using the generated one or more preliminary soft seed labels, said cluster assigning using semi-supervised soft-seeded k-means clustering including;

    assigning an initial centroid to each predefined cluster, wherein, an initial centroid of a particular cluster is computed based on said one or more preliminary soft seed labels, and for a pre-defined cluster not covered by said soft seed labels, computing an initial centroid based on random sampling from all un-labeled data points;

    assigning each of labeled and said un-labeled data points to a cluster in a manner to minimize a distortion measure, said distortion measure including a seed re-assignment penalty value component as a function of said corresponding seed score, said seed re-assignment penalty value assessed upon determining a labeled or un-labeled data point assignment to a category different from the generated preliminary soft seed label;

    updating a centroid value for each said cluster to which a labeled or un-labeled data point has been assigned, andrepeating said labeled or un-labeled data point assigning and centroid value updating until no re-assignment of soft seed labels to clusters of different categories occurs.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×