Classification of Offensive Words
First Claim
1. A computer-implemented method comprising:
- obtaining a plurality of text samples;
identifying, from among the plurality of text samples, a first set of text samples that each includes a particular potentially offensive term;
obtaining labels for the first set of text samples that indicate whether the particular potentially offensive term is used in an offensive manner in respective ones of the text samples in the first set of text samples;
training, based at least on the first set of text samples and the labels for the first set of text samples, a classifier that is configured to use one or more signals associated with a text sample to generate a label that indicates whether a potentially offensive term in the text sample is used in an offensive manner in the text sample; and
providing, to the classifier, a first text sample that includes the particular potentially offensive term, and in response, obtaining, from the classifier, a label that indicates whether the particular potentially offensive term is used in an offensive manner in the first text sample.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method can include identifying a first set of text samples that include a particular potentially offensive term. Labels can be obtained for the first set of text samples that indicate whether the particular potentially offensive term is used in an offensive manner. A classifier can be trained based at least on the first set of text samples and the labels, the classifier being configured to use one or more signals associated with a text sample to generate a label that indicates whether a potentially offensive term in the text sample is used in an offensive manner in the text sample. The method can further include providing, to the classifier, a first text sample that includes the particular potentially offensive term, and in response, obtaining, from the classifier, a label that indicates whether the particular potentially offensive term is used in an offensive manner in the first text sample.
61 Citations
21 Claims
-
1. A computer-implemented method comprising:
-
obtaining a plurality of text samples; identifying, from among the plurality of text samples, a first set of text samples that each includes a particular potentially offensive term; obtaining labels for the first set of text samples that indicate whether the particular potentially offensive term is used in an offensive manner in respective ones of the text samples in the first set of text samples; training, based at least on the first set of text samples and the labels for the first set of text samples, a classifier that is configured to use one or more signals associated with a text sample to generate a label that indicates whether a potentially offensive term in the text sample is used in an offensive manner in the text sample; and providing, to the classifier, a first text sample that includes the particular potentially offensive term, and in response, obtaining, from the classifier, a label that indicates whether the particular potentially offensive term is used in an offensive manner in the first text sample. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. One or more computer-readable devices having instructions stored thereon that, when executed by one or more processors, cause performance of operations comprising:
-
obtaining a plurality of text samples; identifying, from among the plurality of text samples, a first set of text samples that each includes a particular potentially offensive term; obtaining labels for the first set of text samples that indicate whether the particular potentially offensive term is used in an offensive manner in respective ones of the text samples in the first set of text samples; training, based at least on the first set of text samples and the labels for the first set of text samples, a classifier that is configured to use one or more signals associated with a text sample to generate a label that indicates whether a potentially offensive term in the text sample is used in an offensive manner in the text sample; and providing, to the classifier, a first text sample that includes the particular potentially offensive term, and in response, obtaining, from the classifier, a label that indicates whether the particular potentially offensive term is used in an offensive manner in the first text sample. - View Dependent Claims (18, 19)
-
-
20. A system comprising:
one or more computers configured to provide; a repository of potentially offensive terms; a repository of labeled text samples that includes a first set of labeled text samples for which one or more potentially offensive terms from the repository of potentially offensive terms have been labeled in the first set of text samples so as to indicate likelihoods that the potentially offensive terms are used in offensive manners in particular ones of the text samples in the first set of labeled text samples; a repository of non-labeled text samples that includes a first set of non-labeled text samples that include one or more potentially offensive terms from the repository of potentially offensive terms; a classifier that labels the one or more potentially offensive terms in the first set of non-labeled text samples to generate a second set of labeled text samples that are labeled so as to indicate a likelihood that the one or more potentially offensive terms in the text samples are used in offensive manners; and a training engine that trains the classifier based at least on the first set of labeled text samples and the second set of labeled text samples that were labeled by the classifier.
-
21. A computer-implemented method comprising:
-
obtaining a plurality of text samples; identifying, from among the plurality of text samples, a first set of text samples that each includes a particular potentially offensive term; obtaining labels for the first set of text samples that indicate whether a particular user considers the particular potentially offensive term to be used in an offensive manner in respective ones of the text samples in the first set of text samples; training, based at least on the first set of text samples and the labels for the first set of text samples, a user-specific classifier for the particular user, wherein the user-specific classifier is configured to use one or more signals associated with a text sample to generate a label that indicates whether a potentially offensive term in the text sample is likely to be considered by the particular user to be used in an offensive manner in the text sample; and providing, to the user-specific classifier, a first text sample that includes the particular potentially offensive term, and in response, obtaining, from the user-specific classifier, a label that indicates whether the particular potentially offensive term is likely to be considered by the particular user to be used in an offensive manner in the first text sample.
-
Specification