Apparatus and method for detecting spam
First Claim
Patent Images
1. A method of detecting spam in web sites, the method comprising:
- obtaining a text entry;
detecting a number of character transitions between character sets in the text entry, wherein the character sets (i) each correspond to different alphabets and (ii) are different subsets of a character encoding that maps characters from multiple alphabets to respective binary numbers for use by computers;
calculating, with a computer, a score indicative of the likelihood that the text entry is spam based on the number of character transitions; and
labeling the text entry as spam based on the score.
2 Assignments
0 Petitions
Accused Products
Abstract
Provided is a process of detecting spam in websites, the process including: obtaining text from a website; detecting an amount of transitions between character sets in the text, wherein the character sets each correspond to different alphabets; calculating, with a computer, a score indicative of the likelihood that the text is spam based on the amount of transitions; and labeling the text as spam based on the score.
36 Citations
20 Claims
-
1. A method of detecting spam in web sites, the method comprising:
-
obtaining a text entry; detecting a number of character transitions between character sets in the text entry, wherein the character sets (i) each correspond to different alphabets and (ii) are different subsets of a character encoding that maps characters from multiple alphabets to respective binary numbers for use by computers; calculating, with a computer, a score indicative of the likelihood that the text entry is spam based on the number of character transitions; and labeling the text entry as spam based on the score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A tangible, non-transitory, machine-readable medium storing instructions that when executed by a data processing apparatus cause the data processing apparatus to perform operations comprising:
-
obtaining a text entry; detecting a number of character transitions between character sets in the text, wherein the character sets (i) each correspond to different alphabets and (ii) are different subsets of a character encoding that maps characters from multiple alphabets to respective binary numbers for use by computers; calculating, with a computer, a score indicative of the likelihood that the text entry is spam based on the number of character transitions; and labeling the text entry as spam based on the score.
-
-
20. A system, comprising:
-
one or more processors; and memory storing instructions that when executed by the processors cause the processors to effectuate operations comprising; obtaining a text entry; detecting a number of character transitions between character sets in the text, wherein the character sets (i) each correspond to different alphabets and (ii) are different subsets of a character encoding that maps characters from multiple alphabets to respective binary numbers for use by computers; calculating, with a computer, a score indicative of the likelihood that the text entry is spam based on the number of character transitions; and labeling the text entry as spam based on the score.
-
Specification