Systems and methods for word offensiveness detection and processing using weighted dictionaries and normalization
First Claim
1. A method, comprising:
- receiving, using one or more data processors, a plurality of offensive words, wherein each respective offensive word in the plurality of offensive words is associated with a severity score identifying the offensiveness of the respective word;
receiving, using the one or more data processors, a string of words, wherein a candidate word is selected from the string of words;
calculating, using the one or more data processors, for each respective offensive word in the plurality of offensive words, a distance between the candidate word and the respective offensive word;
calculating, using the one or more data processors, a plurality of offensiveness scores for the candidate word, each offensiveness score in the plurality of offensiveness scores based upon (i) the calculated distance between the candidate word and an offensive word in the plurality of offensive words and (ii) the severity score of the offensive word, wherein the plurality of offensiveness scores are calculated according to one or more of;
offensiveness score=A*((B−
C)/B);
offensiveness score=A*((B−
(1/C))/B);
offensiveness score=Max(((A−
C)/A),0); and
offensiveness score=(((B−
C)/B)>
T);
wherein,A is the severity score for an offensive word in the plurality of offensive words;
B is a function of a length of the offensive word;
C is the calculated distance between the candidate word and the offensive word; and
T is a threshold value; and
determining, using the one or more data processors, whether the candidate word is an offender word based on whether the highest offensiveness score in the plurality of offensiveness scores for the candidate word exceeds an offensiveness threshold value.
2 Assignments
0 Petitions
Accused Products
Abstract
Computer-implemented systems and methods are provided for identifying language that would be considered obscene or otherwise offensive to a user or proprietor of a system. A plurality of offensive words are received, where each offensive word is associated with a severity score identifying the offensiveness of that word. A string of words is received. A distance between a candidate word and each offensive word in the plurality of offensive words is calculated, and a plurality of offensiveness scores for the candidate word are calculated, each offensiveness score based on the calculated distance between the candidate word and the offensive word and the severity score of the offensive word. A determination is made as to whether the candidate word is an offender word, where the candidate word is deemed to be an offender word when the highest offensiveness score in the plurality of offensiveness scores exceeds an offensiveness threshold value.
28 Citations
30 Claims
-
1. A method, comprising:
-
receiving, using one or more data processors, a plurality of offensive words, wherein each respective offensive word in the plurality of offensive words is associated with a severity score identifying the offensiveness of the respective word; receiving, using the one or more data processors, a string of words, wherein a candidate word is selected from the string of words; calculating, using the one or more data processors, for each respective offensive word in the plurality of offensive words, a distance between the candidate word and the respective offensive word; calculating, using the one or more data processors, a plurality of offensiveness scores for the candidate word, each offensiveness score in the plurality of offensiveness scores based upon (i) the calculated distance between the candidate word and an offensive word in the plurality of offensive words and (ii) the severity score of the offensive word, wherein the plurality of offensiveness scores are calculated according to one or more of;
offensiveness score=A*((B−
C)/B);
offensiveness score=A*((B−
(1/C))/B);
offensiveness score=Max(((A−
C)/A),0); and
offensiveness score=(((B−
C)/B)>
T);wherein, A is the severity score for an offensive word in the plurality of offensive words; B is a function of a length of the offensive word; C is the calculated distance between the candidate word and the offensive word; and T is a threshold value; and determining, using the one or more data processors, whether the candidate word is an offender word based on whether the highest offensiveness score in the plurality of offensiveness scores for the candidate word exceeds an offensiveness threshold value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer-implemented system, comprising:
-
a data processor; a computer-readable memory encoded with instructions for commanding the data processors to execute steps including; receiving a plurality of offensive words, wherein each respective offensive word in the plurality of offensive words is associated with a severity score identifying the offensiveness of the respective word; receiving a string of words, wherein a candidate word is selected from the string of words; calculating, for each respective offensive word in the plurality of offensive words, a distance between the candidate word and the respective offensive word; calculating a plurality of offensiveness scores for the candidate word, each offensiveness score in the plurality of offensiveness scores based upon (i) the calculated distance between the candidate word and an offensive word in the plurality of offensive words and (ii) the severity score of the offensive word, wherein the plurality of offensiveness scores are calculated according to one or more of;
offensiveness score=A*((B−
C)/B);
offensiveness score=A*((B−
(1/C))/B);
offensiveness score=Max(((A−
C)/A),0); and
offensiveness score=(((B−
C)/B)>
T);wherein A is the severity score for an offensive word in the plurality of offensive words; B is a function of a length of the offensive word; C is the calculated distance between the candidate word and the offensive word; and T is a threshold value; and determining whether the candidate word is an offender word based on whether the highest offensiveness score in the plurality of offensiveness scores for the candidate word exceeds an offensiveness threshold value. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification