Classification of offensive words

US 10,635,750 B1
Filed: 04/17/2018
Issued: 04/28/2020
Est. Priority Date: 04/29/2014
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

receiving a text sample at a computing device, the text sample comprising a set of terms;

identifying, by the computing device, that a first term of the set of terms of the text sample is designated as a term that is potentially offensive in some but not all contexts;

after identifying that the first term of the set of terms of the text sample is designated as a term that is potentially offensive in some but not all contexts, providing the text sample to an offensive term classifier, wherein the offensive term classifier is trained to process text samples containing the first term and to generate indications of whether, in respective contexts defined by the text samples, the first term is to be selectively redacted from a representation of the text sample that is output;

obtaining, by the computing device and from the offensive term classifier, an indication that, in a particular context defined by the text sample, the first term is used in an offensive manner;

in response to obtaining the indication that, in the particular context defined by the text sample, the first term is used in the offensive manner, redacting the first term from the text sample to generate a redacted version of the text sample;

presenting, by the computing device, the redacted version of the text sample;

after presenting the redacted version of the text sample, receiving a user input to un-redact the first term; and

retraining the offensive term classifier using the user input as a training signal that indicates that the first term is to not be selectively redacted from representations of text samples.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method can include identifying a first set of text samples that include a particular potentially offensive term. Labels can be obtained for the first set of text samples that indicate whether the particular potentially offensive term is used in an offensive manner. A classifier can be trained based at least on the first set of text samples and the labels, the classifier being configured to use one or more signals associated with a text sample to generate a label that indicates whether a potentially offensive term in the text sample is used in an offensive manner in the text sample. The method can further include providing, to the classifier, a first text sample that includes the particular potentially offensive term, and in response, obtaining, from the classifier, a label that indicates whether the particular potentially offensive term is used in an offensive manner in the first text sample.

58 Citations

View as Search Results

18 Claims

1. A computer-implemented method, comprising:
- receiving a text sample at a computing device, the text sample comprising a set of terms;
  
  identifying, by the computing device, that a first term of the set of terms of the text sample is designated as a term that is potentially offensive in some but not all contexts;
  
  after identifying that the first term of the set of terms of the text sample is designated as a term that is potentially offensive in some but not all contexts, providing the text sample to an offensive term classifier, wherein the offensive term classifier is trained to process text samples containing the first term and to generate indications of whether, in respective contexts defined by the text samples, the first term is to be selectively redacted from a representation of the text sample that is output;
  
  obtaining, by the computing device and from the offensive term classifier, an indication that, in a particular context defined by the text sample, the first term is used in an offensive manner;
  
  in response to obtaining the indication that, in the particular context defined by the text sample, the first term is used in the offensive manner, redacting the first term from the text sample to generate a redacted version of the text sample;
  
  presenting, by the computing device, the redacted version of the text sample;
  
  after presenting the redacted version of the text sample, receiving a user input to un-redact the first term; and
  
  retraining the offensive term classifier using the user input as a training signal that indicates that the first term is to not be selectively redacted from representations of text samples.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The computer-implemented method of claim 1, further comprising:
    - receiving, by the computing device, an utterance spoken by a user of the computing device; and
      
      transcribing the utterance to generate the text sample.
  - 3. The computer-implemented method of claim 1, wherein:
    - the text sample is a transcription of an utterance spoken by a user of the computing device; and
      
      the method further comprises providing information about the user of the computing device as an input to the offensive term classifier along with the text sample to be used by the offensive term classifier in generating an indication of whether the first term is to be selectively redacted.
  - 4. The computer-implemented method of claim 1, wherein:
    - the text sample was obtained through a website accessed by the computing device or an application on the computing device;
      
      the method further comprises providing information identifying the website or the application as an input to the offensive term classifier along with the text sample to be used by the offensive term classifier in generating an indication of whether the first term is to be selectively redacted.
  - 5. The computer-implemented method of claim 1, wherein the offensive term classifier was trained using machine-learning techniques.
  - 6. The computer-implemented method of claim 1, wherein:
    - the text sample is a transcription of an utterance spoken by a user of the computing device;
      
      the method further comprises obtaining a score that indicates a speech recognition confidence score for the utterance and providing the speech recognition confidence score as an input to the offensive term classifier along with the text sample to be used by the offensive term classifier in generating an indication of whether the first term is to be selectively redacted.
  - 7. The computer-implemented method of claim 6, wherein the offensive term classifier is more likely to indicate that the first term is to be selectively redacted when the speech recognizer confidence score that is input to the offensive term classifier indicates a lower confidence in the accuracy of the transcription, and the offensive term classifier is less likely to indicate that the first term is to be selectively redacted when the score indicates a higher confidence in the accuracy of the transcription.
  - 8. The computer-implemented method of claim 1, wherein the offensive term classifier is trained to use information about additional words in the text sample other than the first term to determine whether the first term is to be selectively redacted.
  - 9. The computer-implemented method of claim 1, wherein the offensive term classifier is specific to the first term, such that the offensive term classifier is only trained to process text samples containing the first term and to generate indications of whether the first term is to be selectively redacted.
  - 10. The computer-implemented method of claim 1, further comprising:
    - receiving a second text sample at the computing device;
      
      identifying, by the computing device, that the second text sample contains the first term that is designated as a term that is also potentially offensive in some but not all contexts;
      
      providing the second text sample to the offensive term classifier;
      
      obtaining, by the computing device and from the offensive term classifier, an indication that the first term is not to be selectively redacted;
      
      in response to obtaining the indication that the first term is not to be selectively redacted, presenting an un-redacted version of the second text sample.

11. One or more non-transitory computer-readable media having instructions stored thereon that, when executed by one or more processors of a computing device, cause the one or more processors to perform operations comprising:
- receiving a text sample at a computing device, the text sample comprising a set of terms;
  
  identifying, by the computing device, that a first term of the set of terms of the text sample is designated as a term that is potentially offensive in some but not all contexts;
  
  after identifying that the first term of the set of terms of the text sample is designated as a term that is potentially offensive in some but not all contexts, providing the text sample to an offensive term classifier, wherein the offensive term classifier is trained to process text samples containing the first term and to generate indications of whether, in respective contexts defined by the text samples, the first term is to be selectively redacted from a representation of the text sample that is output;
  
  obtaining, by the computing device and from the offensive term classifier, an indication that, in a particular context defined by the text sample, the first term is used in an offensive manner;
  
  in response to obtaining the indication that, in the particular context defined by the text sample, the first term is used in the offensive manner, redacting the first term from the text sample to generate a redacted version of the text sample;
  
  presenting, by the computing device, the redacted version of the text sample;
  
  after presenting the redacted version of the text sample, receiving a user input to un-redact the first term; and
  
  retraining the offensive term classifier using the user input as a training signal that indicates that the first term is to not be selectively redacted from representations of text samples.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The computer-readable media of claim 11, wherein the operations further comprise:
    - receiving, by the computing device, an utterance spoken by a user of the computing device; and
      
      transcribing the utterance to generate the text sample.
  - 13. The computer-readable media of claim 11, wherein:
    - the text sample is a transcription of an utterance spoken by a user of the computing device; and
      
      the operations further comprise providing information about the user of the computing device as an input to the offensive term classifier along with the text sample to be used by the offensive term classifier in generating an indication of whether the first term is to be selectively redacted.
  - 14. The computer-readable media of claim 11, wherein:
    - the text sample was obtained through a website accessed by the computing device or an application on the computing device;
      
      the operations further comprise providing information identifying the website or the application as an input to the offensive term classifier along with the text sample to be used by the offensive term classifier in generating an indication of whether the first term is to be selectively redacted.
  - 15. The computer-readable media of claim 11, wherein the offensive term classifier was trained using machine-learning techniques.
  - 16. The computer-readable media of claim 11, wherein:
    - the text sample is a transcription of an utterance spoken by a user of the computing device;
      
      the operations further comprise obtaining a score that indicates a speech recognition confidence score for the utterance and providing the speech recognition confidence score as an input to the offensive term classifier along with the text sample to be used by the offensive term classifier in generating an indication of whether the first term is to be selectively redacted.
  - 17. The computer-readable media of claim 16, wherein the offensive term classifier is more likely to indicate that the first term is to be selectively redacted when the speech recognizer confidence score that is input to the offensive term classifier indicates a lower confidence in the accuracy of the transcription, and the offensive term classifier is less likely to indicate that the first term is to be selectively redacted when the score indicates a higher confidence in the accuracy of the transcription.

18. A computing device comprising:
- one or more processors; and
  
  one or more computer-readable media having instructions stored thereon that, when executed by the one or more processors, cause the one or more processors to perform operations comprising;
  
  receiving a text sample at a computing device, the text sample comprising a set of terms;
  
  identifying, by the computing device, that a first term of the set of terms of the text sample is designated as a term that is potentially offensive in some but not all contexts;
  
  after identifying that the first term of the set of terms of the text sample is designated as a term that is potentially offensive in some but not all contexts, providing the text sample to an offensive term classifier, wherein the offensive term classifier is trained to process text samples containing the first term and to generate indications of whether, in respective contexts defined by the text samples, the first term is to be selectively redacted from a representation of the text sample that is output;
  
  obtaining, by the computing device and from the offensive term classifier, an indication that, in a particular context defined by the text sample, the first term is used in an offensive manner;
  
  in response to obtaining the indication that, in the particular context defined by the text sample, the first term is used in the offensive manner, redacting the first term from the text sample to generate a redacted version of the text sample;
  
  presenting, by the computing device, the redacted version of the text sample;
  
  after presenting the redacted version of the text sample, receiving a user input to un-redact the first term; and
  
  retraining the offensive term classifier using the user input as a training signal that indicates that the first term is to not be selectively redacted from representations of text samples.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Epstein, Mark Edward, Mengibar, Pedro J. Moreno
Primary Examiner(s)
Washburn, Daniel C
Assistant Examiner(s)
Ogunbiyi, Oluwadamilola M

Application Number

US15/955,066
Time in Patent Office

742 Days
Field of Search
US Class Current
CPC Class Codes

G06F 40/205   Parsing

G06F 40/253   Grammatical analysis; Style...

G06F 40/279   Recognition of textual enti...

G06F 40/30   Semantic analysis

Classification of offensive words

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

58 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Classification of offensive words

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

58 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links