×

Large scale machine learning systems and methods

  • US 8,364,618 B1
  • Filed: 06/04/2012
  • Issued: 01/29/2013
  • Est. Priority Date: 11/14/2003
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • generating, by one or more processors, a model based on a plurality of features associated with documents that include spam documents and non-spam documents, the generating of the model including;

    identifying, by the one or more processors, a condition associated with two or more features of the plurality of features,receiving, by the one or more processors and from a plurality of devices associated with the documents, statistics associated with the identified condition, a particular statistic, of the received statistics, being received from a particular device, of the plurality of devices, and the particular statistic indicating a particular weight, associated with the identified condition, for the particular device,generating a candidate rule for the model based on the condition and the received statistics,determining whether to add the candidate rule to the model,upon determining that the candidate rule should not be added to the model, setting a weight, for the candidate rule, to a value that indicates that the candidate rule should not be added to the model, andgenerating, by the one or more processors and based on the received statistics, a composite weight associated with the condition, the composite weight indicating how relevant the condition is, with respect to other conditions, in determining whether a document is to be classified as spam, the other conditions being associated with respective subsets of the plurality of features that differ from the condition;

    receiving, by the one or more processors, a particular document, the particular document being associated with one or more features of the plurality of features;

    determining, by the one or more processors and based on applying the model to the one or more features, to classify the particular document as a spam document; and

    storing, by the one or more processors, information regarding the particular document based on the particular document being classified as the spam document.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×