System and method for adaptive categorization for use with dynamic taxonomies
First Claim
1. A computer-implemented method for categorizing data points belonging to a dataset, each data point having a set of basic features associated therewith, said method comprising:
- matching a textual description of a data point of said data set to category descriptions relating to a pre-defined set of categories,for said data point having said textual description, generating a soft seed data structure based on a result of said matching; and
,assigning each said data points into a predefined number of clusters corresponding to the predefined categories using the generated soft seed data structures and the set of basic features, said cluster assigning including using a semi-supervised clustering framework.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method and computer program product provides a solution to a class of categorization problems using a semi-supervised clustering approach, the method employing performing a Soft Seeded k-means algorithm, which makes effective use of the side information provided by seeds with a wide range of confidence levels, even when they do not provide complete coverage of the pre-defined categories. The semi-supervised clustering is achieved through the introductions of a seed re-assignment penalty measure and model selection measure.
-
Citations
21 Claims
-
1. A computer-implemented method for categorizing data points belonging to a dataset, each data point having a set of basic features associated therewith, said method comprising:
-
matching a textual description of a data point of said data set to category descriptions relating to a pre-defined set of categories, for said data point having said textual description, generating a soft seed data structure based on a result of said matching; and
,assigning each said data points into a predefined number of clusters corresponding to the predefined categories using the generated soft seed data structures and the set of basic features, said cluster assigning including using a semi-supervised clustering framework. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product for categorizing data points belonging to a dataset, each data point having a set of basic features associated therewith, said computer program product comprising:
-
a computer usable medium having computer usable program code embodied therewith, the computer usable program code comprising; computer usable program code configured to match a textual description of a data point of said data set to category descriptions relating to a pre-defined set of categories, computer usable program code configured to generate, for said data point having said textual description, a soft seed data structure based on a result of said matching; and
,computer usable program code configured to assign each said data point into a predefined number of clusters corresponding to the predefined categories using the generated soft seed data structures and the set of basic features, said cluster assigning including using a semi-supervised clustering framework. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A method of deploying a computer program product for categorizing data points belonging to a dataset, each data point having a set of basic features associated therewith, wherein, when executed, the computer program performs the steps of:
-
matching a textual description of a data point of said data set to category descriptions relating to a pre-defined set of categories, for said data point having said textual description, generating a soft seed data structure based on a result of said matching; and
,assigning each said data points into a predefined number of clusters corresponding to the predefined categories using the generated soft seed data structures and the set of basic features, said cluster assigning including using a semi-supervised clustering framework.
-
Specification