Automated machine-learning classification using feature scaling
First Claim
Patent Images
1. A method of automated machine-learning classification, comprising:
- establishing, within a computer, an original feature set, each feature of the original feature set having a predictive value, the predictive value of some features being uncertain for characterizing expected input items during classification thereof;
selecting with the computer a feature set, the feature set being a subset of the original feature set;
obtaining to the computer a number of training items having values for a plurality of different features in the feature set;
calculating with the computer scores for the different features of the feature set using a scoring technique, the score for a given feature being a measure of prediction ability for the given feature and calculated as S=|aF−
1(tpr)−
bF−
1(fpr)|, where S is the score, tpr is the true positive rate of the given feature equal to a number of positive training cases containing a subject feature divided by a number of positive training cases, fpr is the false positive rate of the given feature equal to a number of negative training cases containing the subject feature divided by a number of negative training cases, |*| is an absolute value, F−
(*) is an inverse of an assumed probability distribution function, and a and b are constants;
scaling the values for the features of the feature set with the computer according to the scores for said features as adjusted feature values;
generating a classifier with the computer;
training the classifier using the adjusted feature values for the features of the feature set;
scaling the values for the features in the feature set of an input item with the computer according to the scores as adjusted feature values of the input item; and
classifying an input item using the computer and the adjusted feature values for the input item into the previously trained classifier.
8 Assignments
0 Petitions
Accused Products
Abstract
Provided are systems, methods and techniques for machine-learning classification. In one representative embodiment, an item having values for a plurality of different features in a feature set is obtained, together with scores for the different features. The score for a given feature is a measure of prediction ability for that feature and was calculated as a function of a plurality of different occurrence metrics of the feature. The values for the features are scaled according to the scores for the features, and the item is classified by inputting the adjusted feature set values for the item into a previously trained classifier.
-
Citations
18 Claims
-
1. A method of automated machine-learning classification, comprising:
-
establishing, within a computer, an original feature set, each feature of the original feature set having a predictive value, the predictive value of some features being uncertain for characterizing expected input items during classification thereof; selecting with the computer a feature set, the feature set being a subset of the original feature set; obtaining to the computer a number of training items having values for a plurality of different features in the feature set; calculating with the computer scores for the different features of the feature set using a scoring technique, the score for a given feature being a measure of prediction ability for the given feature and calculated as S=|aF−
1(tpr)−
bF−
1(fpr)|, where S is the score, tpr is the true positive rate of the given feature equal to a number of positive training cases containing a subject feature divided by a number of positive training cases, fpr is the false positive rate of the given feature equal to a number of negative training cases containing the subject feature divided by a number of negative training cases, |*| is an absolute value, F−
(*) is an inverse of an assumed probability distribution function, and a and b are constants;scaling the values for the features of the feature set with the computer according to the scores for said features as adjusted feature values; generating a classifier with the computer; training the classifier using the adjusted feature values for the features of the feature set; scaling the values for the features in the feature set of an input item with the computer according to the scores as adjusted feature values of the input item; and classifying an input item using the computer and the adjusted feature values for the input item into the previously trained classifier. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method of automated machine learning classification, comprising:
-
obtaining, to a first pre-processing portion of a computer, a training item having values for a plurality of different features in a feature set; calculating with a scoring technique implemented by the first pre-processing portion of the computer scores for the different features, the score for a given feature being calculated as S=|aF−
1(tpr)−
bF−
1(fpr)|, where S is the score, tpr is the true positive rate of the given feature equal to a number of positive training cases containing a subject feature divided by a number of positive training cases, fpr is the false positive rate of the given feature equal to a number of negative training cases containing the subject feature divided by a number of negative training cases, |*| is an absolute value, F−
1(*) is an inverse of an assumed probability distribution function, and a and b are constants;scaling the values for the features with the first pre-processing portion of the computer according to the scores for said features, thereby obtaining adjusted feature set values for the training item; training a supervised machine-learning classifier using the adjusted feature set values from the first pre-processing portion of the computer; obtaining to a second pre-processing portion of a computer an unlabeled item having values for the plurality of different features in the feature set; calculating with the scoring technique implemented by the second pre-processing portion of the computer, further scores for the different features, the further score for a given feature being calculated as S; scaling the adjusted feature set values using the second pre-processing portion of the computer according to the further scores for said features, thereby obtaining modified feature set values for the unlabeled item; scaling the values for the features in the feature set of an input item with the computer according to the scores as adjusted feature values of the input item; and classifying the unlabeled item by inputting the modified feature set values into the supervised machine-learning classifier. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium storing computer-executable process steps for machine-learning classification, said process steps comprising:
-
establishing an original feature set, each feature of the original feature set having a predictive value, the predictive value of some features being uncertain for characterizing expected input items during classification thereof; selecting a feature set, the feature set being a subset of the original feature set; obtaining a number of training items having values for a plurality of different features in the feature set; calculating with the computer scores for the different features of the feature set using a scoring technique, the score for a given feature being a measure of prediction ability for the given feature and calculated as S=|aF−
1(tpr)−
bF−
1(fpr)|, where S is the score, tpr is the true positive rate of the given feature equal to a number of positive training cases containing a subject feature divided by a number of positive training cases, fpr is the false positive rate of the given feature equal to a number of negative training cases containing the subject feature divided by a number of negative training cases, |*| is an absolute value, F−
1(*) is an inverse of an assumed probability distribution function, and a and b are constants;scaling the values for the features of the feature set according to the scores for said features as adjusted feature values; generating a classifier; training the classifier using the adjusted feature values of the feature set; scaling the values for the features in the feature set of an input item with the computer according to the scores as adjusted feature values of the input item; and classifying an input item using the adjusted feature values for the input item into the previously trained classifier. - View Dependent Claims (16, 17, 18)
-
Specification