Systems and methods for word offensiveness detection and processing using weighted dictionaries and normalization

US 8,296,130 B2
Filed: 01/29/2010
Issued: 10/23/2012
Est. Priority Date: 01/29/2010
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

receiving, using one or more data processors, a plurality of offensive words, wherein each respective offensive word in the plurality of offensive words is associated with a severity score identifying the offensiveness of the respective word;

receiving, using the one or more data processors, a string of words, wherein a candidate word is selected from the string of words;

calculating, using the one or more data processors, for each respective offensive word in the plurality of offensive words, a distance between the candidate word and the respective offensive word;

calculating, using the one or more data processors, a plurality of offensiveness scores for the candidate word, each offensiveness score in the plurality of offensiveness scores based upon (i) the calculated distance between the candidate word and an offensive word in the plurality of offensive words and (ii) the severity score of the offensive word, wherein the plurality of offensiveness scores are calculated according to one or more of;

offensiveness score=A*((B−

C)/B);

offensiveness score=A*((B−

(1/C))/B);

offensiveness score=Max(((A−

C)/A),0); and

offensiveness score=(((B−

C)/B)>

T);

wherein,A is the severity score for an offensive word in the plurality of offensive words;

B is a function of a length of the offensive word;

C is the calculated distance between the candidate word and the offensive word; and

T is a threshold value; and

determining, using the one or more data processors, whether the candidate word is an offender word based on whether the highest offensiveness score in the plurality of offensiveness scores for the candidate word exceeds an offensiveness threshold value.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Computer-implemented systems and methods are provided for identifying language that would be considered obscene or otherwise offensive to a user or proprietor of a system. A plurality of offensive words are received, where each offensive word is associated with a severity score identifying the offensiveness of that word. A string of words is received. A distance between a candidate word and each offensive word in the plurality of offensive words is calculated, and a plurality of offensiveness scores for the candidate word are calculated, each offensiveness score based on the calculated distance between the candidate word and the offensive word and the severity score of the offensive word. A determination is made as to whether the candidate word is an offender word, where the candidate word is deemed to be an offender word when the highest offensiveness score in the plurality of offensiveness scores exceeds an offensiveness threshold value.

28 Citations

View as Search Results

30 Claims

1. A method, comprising:
- receiving, using one or more data processors, a plurality of offensive words, wherein each respective offensive word in the plurality of offensive words is associated with a severity score identifying the offensiveness of the respective word;
  
  receiving, using the one or more data processors, a string of words, wherein a candidate word is selected from the string of words;
  
  calculating, using the one or more data processors, for each respective offensive word in the plurality of offensive words, a distance between the candidate word and the respective offensive word;
  
  calculating, using the one or more data processors, a plurality of offensiveness scores for the candidate word, each offensiveness score in the plurality of offensiveness scores based upon (i) the calculated distance between the candidate word and an offensive word in the plurality of offensive words and (ii) the severity score of the offensive word, wherein the plurality of offensiveness scores are calculated according to one or more of;
  
  offensiveness score=A*((B−
  
  C)/B);
  
  offensiveness score=A*((B−
  
  (1/C))/B);
  
  offensiveness score=Max(((A−
  
  C)/A),0); and
  
  offensiveness score=(((B−
  
  C)/B)>
  
  T);
  
  wherein,A is the severity score for an offensive word in the plurality of offensive words;
  
  B is a function of a length of the offensive word;
  
  C is the calculated distance between the candidate word and the offensive word; and
  
  T is a threshold value; and
  
  determining, using the one or more data processors, whether the candidate word is an offender word based on whether the highest offensiveness score in the plurality of offensiveness scores for the candidate word exceeds an offensiveness threshold value.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, wherein each word in the plurality of offensive words and each word in the string of words comprises an abbreviation, a single word, a phrase, or a sentence.
  - 3. The method of claim 1, wherein the distance is calculated as:
    - a Levenshtein distance, a Hamming distance, a Damerau-Levenshtein distance, a Dice coefficient, a Sø
      
      rensen similarity index, or a Jaro-Winkler distance.
  - 4. The method of claim 1, wherein the offensiveness threshold value is set by a service administrator;
    - wherein the string of words is an input from a user to a service; and
      
      wherein the input from the user to the service is rejected if a candidate word in the string of words is identified as an offender word by having an offensiveness score exceeding the offensiveness threshold value set by the service administrator.
  - 5. The method of claim 4, wherein the service is a content review portal, and wherein the offensiveness threshold is set based on one of:
    - grouping of the content in which content being reviewed resides;
      
      a particular content with which the offensiveness threshold is associated; and
      
      a third-party content rating for content.
  - 6. The method of claim 5, wherein the grouping of the content comprises a particular classification of subject matter, a genre, geography of origin, wherein geography comprises country or countries, state, city, principality or collections of regions or subregions thereof, a group of professional or government certifications or ratings, or industry festival or event selections.
  - 7. The method of claim 4, wherein the service is selected from the group consisting of:
    - a message board;
      
      a content review portal;
      
      a chat room;
      
      a bulletin board system;
      
      a social networking website, and a multiplayer game.
  - 8. The method of claim 1, wherein:
    - the offensiveness threshold value is set by a user of a service;
      
      the string of words is an intended output from the service to the user; and
      
      the string of words containing a candidate word identified as an offender word by having an offensiveness score that exceeds the offensiveness threshold set by the user is modified prior to being displayed to the user.
  - 9. The method of claim 8, wherein the string of words is modified according to one of the following:
    - deleting the string of words such that the string of words is not displayed to the user;
      
      deleting the offensive word from the string of words so that the offensive word is not displayed to the user;
      
      censoring the string of words such that the string of words is not displayed to the user;
      
      orcensoring the offensive word from the string of words so that the offensive word is not displayed to the user.
  - 10. The method of claim 8, wherein the plurality of offensive words and an offensiveness threshold are set based on cultural norms identified with the user.
  - 11. The method of claim 8, wherein the plurality of offensive words and an offensiveness threshold are set based upon definitions defined by a government institution having jurisdictional authority for a user or a non-governmental institution with which the user is associated.
  - 12. The method of claim 8, wherein a maximum offensiveness threshold is set for a user, and wherein the user cannot set the offensiveness threshold higher than the maximum offensiveness threshold.
  - 13. The method of claim 1, wherein the string of words containing a candidate word identified as an offender word by having an offensiveness score that exceeds the offensiveness threshold is rejected as input into the system.
  - 14. The method of claim 1, wherein the plurality of offensive words and severity score identifying each of the plurality of offensive words are identified by a user, a service administrator, a third-party, or any combination thereof.
  - 15. The method of claim 1, wherein the highest offensiveness score is one of:
    - a smallest value offensiveness score calculated in comparing each of the plurality of offensive words with the candidate word, ora largest value offensiveness score calculated in comparing each of the plurality of offensive words with the candidate word.

16. A computer-implemented system, comprising:
- a data processor;
  
  a computer-readable memory encoded with instructions for commanding the data processors to execute steps including;
  
  receiving a plurality of offensive words, wherein each respective offensive word in the plurality of offensive words is associated with a severity score identifying the offensiveness of the respective word;
  
  receiving a string of words, wherein a candidate word is selected from the string of words;
  
  calculating, for each respective offensive word in the plurality of offensive words, a distance between the candidate word and the respective offensive word;
  
  calculating a plurality of offensiveness scores for the candidate word, each offensiveness score in the plurality of offensiveness scores based upon (i) the calculated distance between the candidate word and an offensive word in the plurality of offensive words and (ii) the severity score of the offensive word, wherein the plurality of offensiveness scores are calculated according to one or more of;
  
  offensiveness score=A*((B−
  
  C)/B);
  
  offensiveness score=A*((B−
  
  (1/C))/B);
  
  offensiveness score=Max(((A−
  
  C)/A),0); and
  
  offensiveness score=(((B−
  
  C)/B)>
  
  T);
  
  whereinA is the severity score for an offensive word in the plurality of offensive words;
  
  B is a function of a length of the offensive word;
  
  C is the calculated distance between the candidate word and the offensive word; and
  
  T is a threshold value; and
  
  determining whether the candidate word is an offender word based on whether the highest offensiveness score in the plurality of offensiveness scores for the candidate word exceeds an offensiveness threshold value.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 17. The system of claim 16, wherein each word in the plurality of offensive words and each word in the string of words comprises an abbreviation, a single word, a phrase, or a sentence.
  - 18. The system of claim 16, wherein the distance is calculated as:
    - a Levenshtein distance, a Hamming distance, a Damerau-Levenshtein distance, a Dice coefficient, a Sø
      
      rensen similarity index, or a Jaro-Winkler distance.
  - 19. The system of claim 16, wherein the offensiveness threshold value is set by a service administrator;
    - wherein the string of words is an input from a user to a service; and
      
      wherein the input from the user to the service is rejected if a candidate word in the string of words is identified as an offender word by having an offensiveness score exceeding the offensiveness threshold value set by the service administrator.
  - 20. The system of claim 19, wherein the service is a content review portal, and wherein the offensiveness threshold is set based on one of:
    - grouping of the content in which content being reviewed resides;
      
      a particular content with which the offensiveness threshold is associated; and
      
      a third-party content rating for content.
  - 21. The system of claim 20, wherein the grouping of the content comprises a particular classification of subject matter, a genre, geography of origin, wherein geography comprises country or countries, state, city, principality or collections of regions or subregions thereof, a group of professional or government certifications or ratings, or industry festival or event selections.
  - 22. The system of claim 19, wherein the service is selected from the group consisting of:
    - a message board;
      
      a content review portal;
      
      a chat room;
      
      a bulletin board system;
      
      a social networking website, and a multiplayer game.
  - 23. The system of claim 16, wherein:
    - the offensiveness threshold value is set by a user of a service;
      
      the string of words is an intended output from the service to the user; and
      
      the string of words containing a candidate word identified as an offender word by having an offensiveness score that exceeds the offensiveness threshold set by the user is modified prior to being displayed to the user.
  - 24. The system of claim 23, wherein the string of words is modified according to one of the following:
    - deleting the string of words such that the string of words is not displayed to the user;
      
      deleting the offensive word from the string of words so that the offensive word is not displayed to the user;
      
      censoring the string of words such that the string of words is not displayed to the user;
      
      orcensoring the offensive word from the string of words so that the offensive word is not displayed to the user.
  - 25. The system of claim 23, wherein the plurality of offensive words and an offensiveness threshold are set based on cultural norms identified with the user.
  - 26. The system of claim 23, wherein the plurality of offensive words and an offensiveness threshold are set based upon definitions defined by a government institution having jurisdictional authority for a user or a non-governmental institution with which the user is associated.
  - 27. The system of claim 23, wherein a maximum offensiveness threshold is set for a user, and wherein the user cannot set the offensiveness threshold higher than the maximum offensiveness threshold.
  - 28. The system of claim 16, wherein the string of words containing a candidate word identified as an offender word by having an offensiveness score that exceeds the offensiveness threshold is rejected as input into the system.
  - 29. The system of claim 16, wherein the plurality of offensive words and severity score identifying each of the plurality of offensive words are identified by a user, a service administrator, a third-party, or any combination thereof.
  - 30. The system of claim 16, wherein the highest offensiveness score is one of:
    - a smallest value offensiveness score calculated in comparing each of the plurality of offensive words with the candidate word, ora largest value offensiveness score calculated in comparing each of the plurality of offensive words with the candidate word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ipar LLC
Original Assignee
Ipar LLC
Inventors
Spears, Joseph L.
Primary Examiner(s)
WOZNIAK, JAMES S

Application Number

US12/696,991
Publication Number

US 20110191105A1
Time in Patent Office

998 Days
Field of Search

704/1, 704/9, 715/256, 715/257, 715/260, 715/267, 715/271, 707/754
US Class Current

704/9
CPC Class Codes

G06F 16/9535   Search customisation based ...

G06F 40/10   Text processing natural lan...

G06F 40/242   Dictionaries

Systems and methods for word offensiveness detection and processing using weighted dictionaries and normalization

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

28 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for word offensiveness detection and processing using weighted dictionaries and normalization

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

28 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links