×

Classifier tuning based on data similarities

  • US 7,089,241 B1
  • Filed: 12/22/2003
  • Issued: 08/08/2006
  • Est. Priority Date: 01/24/2003
  • Status: Expired due to Fees
First Claim
Patent Images

1. A machine readable medium storing one or more programs that implement an e-mail classifier for determining whether at least one received e-mail should be classified as spam, the one or more programs comprising instructions for causing one or more processing devices to perform the following operations:

  • obtain feature data for the received e-mail by determining whether the received e-mail has a predefined set of features;

    train a scoring classifier using a set of unique training e-mails;

    provide a classification output, using the scoring classifier, based on the obtained feature data, wherein the classification output is indicative of whether or not the received e-mail is spam;

    compare the provided classification output to a classification threshold, wherein the received e-mail is classified as spam when the comparison of the classification output to the classification threshold indicates the received e-mail is spam;

    determine at least one similarity rate for at least one e-mail, wherein the at least one similarity rate is the rate at which e-mails, which are substantially similar to the at least one e-mail, are received by the e-mail classifier;

    select and set a value for the classification threshold, wherein selecting and setting the value for the classification threshold includes;

    selecting and setting an initial value for the classification threshold that reduces misclassification costs based on a set of unique evaluation e-mails; and

    selecting and setting a new value for the classification threshold that reduces the misclassification costs based at least on the determined at least one similarity rate.

View all claims
  • 10 Assignments
Timeline View
Assignment View
    ×
    ×