Large scale machine learning systems and methods
First Claim
Patent Images
1. A method comprising:
- generating, by one or more processors, a model based on a plurality of features associated with documents that include spam documents and non-spam documents, the generating of the model including;
identifying, by the one or more processors, a condition associated with two or more features of the plurality of features,receiving, by the one or more processors and from a plurality of devices associated with the documents, statistics associated with the identified condition, a particular statistic, of the received statistics, being received from a particular device, of the plurality of devices, and the particular statistic indicating a particular weight, associated with the identified condition, for the particular device,generating a candidate rule for the model based on the condition and the received statistics,determining whether to add the candidate rule to the model,upon determining that the candidate rule should not be added to the model, setting a weight, for the candidate rule, to a value that indicates that the candidate rule should not be added to the model, andgenerating, by the one or more processors and based on the received statistics, a composite weight associated with the condition, the composite weight indicating how relevant the condition is, with respect to other conditions, in determining whether a document is to be classified as spam, the other conditions being associated with respective subsets of the plurality of features that differ from the condition;
receiving, by the one or more processors, a particular document, the particular document being associated with one or more features of the plurality of features;
determining, by the one or more processors and based on applying the model to the one or more features, to classify the particular document as a spam document; and
storing, by the one or more processors, information regarding the particular document based on the particular document being classified as the spam document.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for generating a model is provided. The system generates, or selects, candidate conditions and generates, or otherwise obtains, statistics regarding the candidate conditions. The system also forms rules based, at least in part, on the statistics and the candidate conditions and selectively adds the rules to the model.
48 Citations
20 Claims
-
1. A method comprising:
-
generating, by one or more processors, a model based on a plurality of features associated with documents that include spam documents and non-spam documents, the generating of the model including; identifying, by the one or more processors, a condition associated with two or more features of the plurality of features, receiving, by the one or more processors and from a plurality of devices associated with the documents, statistics associated with the identified condition, a particular statistic, of the received statistics, being received from a particular device, of the plurality of devices, and the particular statistic indicating a particular weight, associated with the identified condition, for the particular device, generating a candidate rule for the model based on the condition and the received statistics, determining whether to add the candidate rule to the model, upon determining that the candidate rule should not be added to the model, setting a weight, for the candidate rule, to a value that indicates that the candidate rule should not be added to the model, and generating, by the one or more processors and based on the received statistics, a composite weight associated with the condition, the composite weight indicating how relevant the condition is, with respect to other conditions, in determining whether a document is to be classified as spam, the other conditions being associated with respective subsets of the plurality of features that differ from the condition; receiving, by the one or more processors, a particular document, the particular document being associated with one or more features of the plurality of features; determining, by the one or more processors and based on applying the model to the one or more features, to classify the particular document as a spam document; and storing, by the one or more processors, information regarding the particular document based on the particular document being classified as the spam document. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
one or more processors to; generate a model based on a plurality of features associated with documents that include spam documents and non-spam documents; the one or more processors, when generating the model, being further to; identify a condition associated with two or more features of the plurality of features, receive, from a plurality of devices associated with the documents, statistics associated with the identified condition, a particular statistic, of the received statistics, being received from a particular device, of the plurality of devices, and the particular statistic indicating a particular weight, associated with the identified condition, for the particular device, generate a candidate rule for the model based on the condition and the received statistics, determine whether to add the candidate rule to the model, and upon determining that the candidate rule should not be added to the model, set a weight, for the candidate rule, to a value that indicates that the candidate rule should not be added to the model, and generate, based on the received statistics, a composite weight associated with the condition, the composite weight indicating how relevant the condition is, with respect to other conditions, in determining whether a document is to be classified as spam, the other conditions being associated with respective subsets of the plurality of features that differ from the condition; receive a particular document, the particular document being associated with one or more features of the plurality of features; classify, based on applying the model to the one or more features, the particular document as a spam document; and processing the particular document based on classifying the particular document as the spam document. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A non-transitory memory device, comprising:
one or more instructions which, when executed by one or more processors, cause the one or more processors to; identify a condition associated with two or more features, of a plurality of features associated with documents that include spam documents and non-spam documents; receive, from a plurality of devices associated with the documents, statistics associated with the identified condition, a particular statistic, of the received statistics, being received from a particular device, of the plurality of devices, and the particular statistic indicating a particular weight, associated with the identified condition, for the particular device; generate a candidate rule for the model based on the condition and the received statistics; determine whether to add the candidate rule to the model; upon determining that the candidate rule should not be added to the model, set a weight, for the candidate rule, to a value that indicates that the candidate rule should not be added to the model; generate, based on the received statistics, a composite weight associated with the condition, the composite weight indicating how relevant the condition is, with respect to other conditions, in determining whether a document is to be classified as spam, the other conditions being associated with respective subsets of the plurality of features that differ from the condition; receive a particular document, the particular document being associated with one or more features of the plurality of features; and classify, based on applying the composite weight to the one or more features, the particular document as a spam document. - View Dependent Claims (16, 17, 18, 19, 20)
Specification