Compliance model training to classify landing page content that violates content item distribution guidelines
First Claim
1. A method performed by data processing apparatus having memory, a processor, and code stored in the memory and executed in the processor, the method comprising:
- receiving by the processor training data that specify manual classifications of content items and feature values for each of the content items, the manual classification for each of the content items specifying whether the content item is a violating content item that violates content item distribution guidelines, the feature values specifying one or more characteristics of the content items and characteristics of landing pages to which the content items link;
training by the processor a compliance model using the training data, the compliance model being trained to classify an unclassified content item as a violating content item based on the feature values of the unclassified content item;
determining by the processor that the compliance model has an accuracy measure that meets a threshold accuracy measure;
in response to determining that the accuracy measure for the compliance model meets the accuracy threshold, classifying by the processor unclassified content items using the feature values for the unclassified content items; and
providing by the processor data specifying the classifications of the unclassified content items;
wherein classifying unclassified content items comprises classifying at least one unclassified content item as a suspicious content item, and further comprising;
providing the suspicious content item to a rater for manual classification;
receiving data specifying a manual classification of the suspicious content item; and
updating the compliance model using the manual classification and the feature values of the suspicious content item.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for analyzing content item compliance with specified guidelines. In one aspect, a method includes receiving training data that specify manual classifications of content items and feature values for each of the content items, where each manual classification specifies whether the content item is a violating content item. Using the training data, a compliance model is trained to classify an unclassified content item as a violating content item based on the feature values of the unclassified content item. A determination is made that the compliance model has an accuracy measure that meets a threshold accuracy measure. In response to determining that the accuracy measure for the compliance model meets the accuracy threshold, unclassified content items are classified using the feature values for the unclassified content items, and data specifying the classifications are provided.
-
Citations
21 Claims
-
1. A method performed by data processing apparatus having memory, a processor, and code stored in the memory and executed in the processor, the method comprising:
-
receiving by the processor training data that specify manual classifications of content items and feature values for each of the content items, the manual classification for each of the content items specifying whether the content item is a violating content item that violates content item distribution guidelines, the feature values specifying one or more characteristics of the content items and characteristics of landing pages to which the content items link; training by the processor a compliance model using the training data, the compliance model being trained to classify an unclassified content item as a violating content item based on the feature values of the unclassified content item; determining by the processor that the compliance model has an accuracy measure that meets a threshold accuracy measure; in response to determining that the accuracy measure for the compliance model meets the accuracy threshold, classifying by the processor unclassified content items using the feature values for the unclassified content items; and providing by the processor data specifying the classifications of the unclassified content items; wherein classifying unclassified content items comprises classifying at least one unclassified content item as a suspicious content item, and further comprising; providing the suspicious content item to a rater for manual classification; receiving data specifying a manual classification of the suspicious content item; and updating the compliance model using the manual classification and the feature values of the suspicious content item. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A non-transitory computer storage medium encoded with a computer program, the program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform operations comprising:
-
receiving by the data processing apparatus training data that specify manual classifications of content items and feature values for each of the content items, the manual classification for each of the content items specifying whether the content item is a violating content item that violates content item distribution guidelines, the feature values specifying one or more characteristics of the content items and characteristics of landing pages to which the content items link; training by the data processing apparatus a compliance model using the training data, the compliance model being trained to classify an unclassified content item as a violating content item based on the feature values of the unclassified content item; determining by the data processing apparatus that the compliance model has an accuracy measure that meets a threshold accuracy measure; in response to determining that the accuracy measure for the compliance model meets the accuracy threshold, classifying by the data processing apparatus unclassified content items using the feature values for the unclassified content items; updating by the data processing apparatus the compliance model using the classifications of the unclassified content items; and preventing by the data processing a apparatus violating content items from being distributed; wherein classifying unclassified content items comprises classifying at least one unclassified content item as a suspicious content item, and further comprising; providing the suspicious content item to a rater for manual classification; receiving data specifying a manual classification of the suspicious content item; and updating the compliance model using the manual classification and the feature values of the suspicious content item.
-
-
12. A system comprising:
-
a data store storing training data that specify manual classifications of content items and feature values for each of the content items, the manual classification for each of the content items specifying whether the content item is a violating content item that violates content item distribution guidelines, the feature values specifying one or more characteristics of the content items and characteristics of landing pages to which the content items link; and one or more computers, each having a memory, a processor, and code stored in the memory and executable in the processor, the one or more computers being operable to interact with the data store and to cause the processor to perform operations including; receiving the training data; training a compliance model using the training data, the compliance model being trained to classify an unclassified content item as a violating content item based on the feature values of the unclassified content item; determining that the compliance model has an accuracy measure that meets a threshold accuracy measure; in response to determining that the accuracy measure for the compliance model meets the accuracy threshold, classifying unclassified content items using the feature values for the unclassified content items; and providing data specifying the classifications of the unclassified content items; wherein the one or more computers are further operable to perform operations including; classifying at least one unclassified content item as a suspicious content item, and further comprising; providing the suspicious content item to a rater for manual classification; receiving data specifying a manual classification of the suspicious content item; and updating the compliance model using the manual classification and the feature values of the suspicious content item. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
-
Specification