System and method for selection of important attributes
First Claim
1. A method for determining important attributes for discriminating between different values of a label attribute, the method comprising the steps of:
- receiving a set of records, each record having a plurality of attributes;
permitting a user to choose the label attribute;
permitting a user to choose at least one first attribute that is considered important;
generating at least one second important attribute, said generated at least one second important attribute together with the said chosen at least one first important attribute discriminate well between the different values of the label attribute;
generating non-cumulative purity for each of said chosen at least one first important attribute and each of said generated at least one second important attribute, said non-cumulative purity indicating how well each first and second important attribute individually discriminates between different values of the label attribute; and
generating cumulative purity for each of said chosen at least one first important attribute and each of said generated at least one second important attribute, said cumulative purity indicating how well a respective important attribute in combination with other first and second important attributes discriminates between different values of the label attribute.
6 Assignments
0 Petitions
Accused Products
Abstract
A system and method determines how well various attributes in a record discriminate different values of a chosen label attribute. An attribute is considered a relevant attribute if it discriminates different values of a chosen label attribute either alone or in conjunction with other attributes. According to the present invention, a label attribute is selected by a user from a set of records, with each record having a plurality of attributes. Next, one or more first important attributes considered important by the user are selected. The present invention then generates one or more second important attributes. The second important attributes together with the user chosen first important attributes discriminate well between different values of the label attribute. A measure called "purity" (a number from 0 to 100) informs how well each attribute discriminates the different label attributes. The purity measure allows the attributes to be ranked based on their importance.
121 Citations
30 Claims
-
1. A method for determining important attributes for discriminating between different values of a label attribute, the method comprising the steps of:
-
receiving a set of records, each record having a plurality of attributes; permitting a user to choose the label attribute; permitting a user to choose at least one first attribute that is considered important; generating at least one second important attribute, said generated at least one second important attribute together with the said chosen at least one first important attribute discriminate well between the different values of the label attribute; generating non-cumulative purity for each of said chosen at least one first important attribute and each of said generated at least one second important attribute, said non-cumulative purity indicating how well each first and second important attribute individually discriminates between different values of the label attribute; and generating cumulative purity for each of said chosen at least one first important attribute and each of said generated at least one second important attribute, said cumulative purity indicating how well a respective important attribute in combination with other first and second important attributes discriminates between different values of the label attribute. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for determining important attributes for discriminating between the different values of a label attribute, the method comprising the steps of:
-
receiving a set of records, each record having a plurality of attributes; permitting a user to choose the label attribute; permitting a user to choose at least one first important attribute that is considered important; generating at least one second important attribute, wherein each generated at least one second important attribute together with the said chosen at least one first important attribute discriminate better than at least one of the other attributes between the different values of the label attribute; generating purity for said generated at least one second important attribute, said purity indicating how well each generated at least one second important attribute discriminates different values of the chosen label attribute; and generating rank of each said generated at least one second important attribute based on its purity. - View Dependent Claims (7, 8, 9, 10, 11, 12)
-
-
13. A method for determining the most important attributes from a set of records in a database, said most important attributes discriminate different values of a chosen label attribute, the method comprising the steps of:
-
(a) receiving a set of records having a total of n attributes; (b) receiving a parameter k, said parameter k indicates number of important variables desired; (c) selecting a first set of attributes S; (d) discretizing non-categorical attributes in S; (e) discretizing non-categorical attributes not in S; (f) computing separability criterion for each attribute not in S conditioned on first set of attributes S; (g) adding attribute A to S, A being the attribute with the highest separability criterion; and (h) if the number of attributes in S is less than k input attributes, returning to step (e), else, computing separability criteria for all attributes not in S.
-
-
14. A system for determining important attributes for discriminating between different values of a label attribute, comprising:
-
means for receiving a set of records, each record having a plurality of attributes; means for permitting a user to choose the label attribute; means for permitting a user to choose at least one first important attribute that is considered important; means for generating at least one second important attribute, said generated at least one second important attribute together with the said chosen at least one first important attribute discriminate better than at least one of the other attributes between different values of the label attribute; means for generating non-cumulative purity of each of said chosen at least one first attribute and each of said generated at least one second important attributes, said non-cumulative purity indicating how well each first and second important attribute individually discriminates between different values of the label attribute; and means for generating cumulative purity of each of said chosen at least one first important attribute and each of said generated at least one second important attribute, said cumulative purity indicates how well a respective important attribute in combination with all other first and second important attributes discriminates between different values of the label attribute. - View Dependent Claims (15, 16, 17, 18)
-
-
19. A system for determining important attributes for discriminating between different values of a label attribute, comprising:
-
means for receiving a set of records, each record having a plurality of attributes; means for permitting a user to choose the label attribute; means for permitting a user to choose at least one first important attribute that is considered important; means for generating at least one second important attribute, wherein each generated at least one second important attribute together with the said chosen at least one first important attribute discriminate better than at least one of the other attributes between the different values of the label attribute; means for generating purity for said generated at least one second important attribute, said purity indicating how well each generated at least one second important attribute discriminates different values of the chosen label attribute; and means for generating rank of each said generated at least one second important attribute based on its purity. - View Dependent Claims (20, 21, 22, 23, 24)
-
-
25. A system for determining the most important attributes from a set of records in a database, said most important attributes discriminate different values of a chosen label attribute, comprising:
-
(a) means for receiving a set of records having a total of n attributes; (b) means for receiving a parameter k, said parameter k indicates number of important variables desired; (c) means for selecting a first set of attributes S; (d) means for discretizing non-categorical attributes in S; (e) means for discretizing non-categorical attributes not in S; (f) means for computing separability criterion for each attribute not in S conditioned on first set of attributes S; (g) means for adding attribute A to S, A being the attribute with the highest separability criterion; (h) means for returning to step (e), if the number of attributes in S is less than k input attributes; and (g) means for computing separability criteria for all attributes not in S.
-
-
26. A computer program product for determining important attributes for discriminating between different values of a label attribute, comprising:
-
means for receiving a set of records, each record having a plurality of attributes; means for permitting a user to choose the label attribute; means for permitting a user to choose at least one first important attribute that is considered important; means for generating at least one second important attribute, said generated at least one second important attribute together with the said chosen at least one first important attribute discriminate better than at least one of the other attributes between different values of the label attribute; means for generating non-cumulative purity of each of said chosen at least one first attribute and each of said generated at least one second important attributes, said non-cumulative purity indicating how well each first and second important attribute individually discriminates between different values of the label attribute; and means for generating cumulative purity of each of said chosen at least one first important attribute and each of said generated at least one second important attribute, said cumulative purity indicates how well a respective important attribute in combination with all other first and second important attributes discriminates between different values of the label attribute. - View Dependent Claims (27, 28, 29, 30)
-
Specification