Classifier Tuning Based On Data Similarities
First Claim
1. A method for adjusting a classification threshold of a data item classifier based on received data items, wherein the data item classifier classifies an incoming data item as a member of a particular class when a comparison of a classification output for the incoming data item to the classification threshold indicates the data item belongs to the particular class, the method comprising:
- determining the similarity rate for unique data items in the received data items;
determining a threshold value for the classification threshold that reduces misclassification costs based on, at least in part, the similarity rate for unique data items in the received data items; and
setting the classification threshold to the threshold value.
10 Assignments
0 Petitions
Accused Products
Abstract
A probabilistic classifier is used to classify data items in a data stream. The probabilistic classifier is trained, and an initial classification threshold is set, using unique training and evaluation data sets (i.e., data sets that do not contain duplicate data items). Unique data sets are used for training and in setting the initial classification threshold so as to prevent the classifier from being improperly biased as a result of similarity rates in the training and evaluation data sets that do not reflect similarity rates encountered during operation. During operation, information regarding the actual similarity rates of data items in the data stream is obtained and used to adjust the classification threshold such that misclassification costs are minimized given the actual similarity rates.
-
Citations
16 Claims
-
1. A method for adjusting a classification threshold of a data item classifier based on received data items, wherein the data item classifier classifies an incoming data item as a member of a particular class when a comparison of a classification output for the incoming data item to the classification threshold indicates the data item belongs to the particular class, the method comprising:
-
determining the similarity rate for unique data items in the received data items;
determining a threshold value for the classification threshold that reduces misclassification costs based on, at least in part, the similarity rate for unique data items in the received data items; and
setting the classification threshold to the threshold value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-usable medium having a computer program embodied thereon for adjusting a classification threshold of a data item classifier based on received data items, wherein the data item classifier classifies an incoming data item as a member of a particular class when a comparison of a classification output for the incoming data item to the classification threshold indicates the data item belongs to the particular class, the computer program comprising instructions for causing a computer to perform the following operations:
-
determine the similarity rate for unique data items in the received data items;
determine a threshold value for the classification threshold that reduces misclassification costs based on, at least in part, the similarity rate for unique data items in the received data items; and
set the classification threshold to the threshold value. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification