Online active learning in user-generated content streams
First Claim
1. A method for delivering modified user generated content for display on client devices, comprising the operations of:
- receiving, at one or more servers over a network, content that is user generated content from an online stream at a website, the content including text;
converting, by a machine process executed at the one or more servers, the content into an elemental representation using a bag of words model;
applying a probit model to the elemental representation to obtain a predictive probability that the content is abusive or not abusive, the machine process further includes, calculating an importance weight for the probit model based on the elemental representation, the importance weight is modeled as a multivariate Gaussian distribution with a mean and a covariance matrix;
creating a probabilistic queue for delivering the content to a human labeler for acquiring a label for the content, wherein placement of the content within the probabilistic queue depends on the predictive probability that the content is abusive or not abusive;
updating the probit model using the elemental representation, the importance weight, and the label acquired from the human labeler, the updating the probit model includes calculating an updated mean and an updated covariance matrix for the multivariate Gaussian distribution of the importance weight based on the label;
receiving, at the one or more servers, a request from a client device for the online stream at the website, the online stream including the content;
applying the probit model having been updated to the content and removing the content from the online stream to produce a modified online stream, the removing is based on the predictive probability that the content is abusive as calculated by the probit model having been updated; and
sending, from the one or more servers, the modified online stream to the client device for display.
6 Assignments
0 Petitions
Accused Products
Abstract
Software for online active learning receives content posted to an online stream at a website. The software converts the content into an elemental representation and inputs the elemental representation into a probit model to obtain a predictive probability that the content is abusive. The software also calculates an importance weight based on the elemental representation. And the software updates the probit model using the content, the importance weight, and an acquired label if a condition is met. The condition depends on an instrumental distribution. The software removes the content from the online stream if a condition is met. The condition depends on the predictive probability, if an acquired label is unavailable.
24 Citations
17 Claims
-
1. A method for delivering modified user generated content for display on client devices, comprising the operations of:
-
receiving, at one or more servers over a network, content that is user generated content from an online stream at a website, the content including text; converting, by a machine process executed at the one or more servers, the content into an elemental representation using a bag of words model; applying a probit model to the elemental representation to obtain a predictive probability that the content is abusive or not abusive, the machine process further includes, calculating an importance weight for the probit model based on the elemental representation, the importance weight is modeled as a multivariate Gaussian distribution with a mean and a covariance matrix; creating a probabilistic queue for delivering the content to a human labeler for acquiring a label for the content, wherein placement of the content within the probabilistic queue depends on the predictive probability that the content is abusive or not abusive; updating the probit model using the elemental representation, the importance weight, and the label acquired from the human labeler, the updating the probit model includes calculating an updated mean and an updated covariance matrix for the multivariate Gaussian distribution of the importance weight based on the label; receiving, at the one or more servers, a request from a client device for the online stream at the website, the online stream including the content; applying the probit model having been updated to the content and removing the content from the online stream to produce a modified online stream, the removing is based on the predictive probability that the content is abusive as calculated by the probit model having been updated; and sending, from the one or more servers, the modified online stream to the client device for display. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for preventing spread of abusive content to client devices, comprising:
-
receiving content that is user generated from an online stream at a website, the content including text; converting the content into elemental representation; applying a probit model to the elemental representation to obtain a predictive probability that the content is abusive or not abusive, the applying the probit model includes calculating an importance weight based on the elemental representation, the probit model includes a memory loss factor associated with the importance weight, the importance weight is modeled as a multivariate Gaussian distribution with a mean and a covariance matrix; creating a probabilistic queue for delivering the content to a human labeler for acquiring a label for the content indicative of whether the content is abusive or not abusive, the content is inserted into the probabilistic queue depending on the predictive probability that the content is abusive or not abusive; updating the probit model using the elemental representation, the importance weight, the memory loss factor, and the label acquired from the human labeler, the updating includes calculating an updated mean and an updated covariance matrix for the multivariate Gaussian distribution of the importance weight based on the label; receiving a request from a client device for the online stream at the website, the online stream including the content; applying the probit model having been updated to the content, the applying the probit model having been updated includes, removing the content from the online stream if the predictive probability is determined to be abusive as calculated by the probit model having been updated, or not removing the content from the online stream if the predictive probability is determined to be not abusive as calculated by the probit model having been updated; and sending the online stream to the client device. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A computer-readable storage medium that is non-transitory and that stores a program for delivering modified user generated content for display on client devices, wherein the program, when executed, instructs a processor to perform the following operations:
-
receive, at one or more servers, content posted to an online stream at a website; convert, by a machine process, the content into an elemental representation using a bag of words model; apply the probit model to the elemental representation to obtain a predictive probability that the content is abusive, the machine process further includes, calculate an importance weight for the probit model based on the elemental representation, the importance weight is defined by a multivariate Gaussian distribution defined by a mean and a covariance matrix; create a probabilistic queue for delivering the content to a human labeler for acquiring a label for the content, wherein placement of the content within the probabilistic queue depends on the predictive probability that the content is abusive or not abusive; update the probit model with the elemental representation and the importance weight, and the label acquired from the human labeler, said update the probit model includes calculating, based on the label, an updated mean and an updated covariance matrix that define the multivariate Gaussian distribution that defines the importance weight; receive, at the one or more servers, a request from a client device for the online stream at the website, the online stream including the content; apply the probit model having been updated to the content to remove the content from the online stream to produce a modified online stream, said remove the content is based on the predictive probability that the content is abusive as calculated by the probit model having been updated; and send, from the one or more servers, the modified online stream to the client device for display. - View Dependent Claims (13, 14, 15, 16, 17)
-
Specification