System and method for detecting spammers in a network environment
First Claim
Patent Images
1. A method comprising:
- processing a first text created by a user using an online service into a first bag of words, the first bag of words comprising a list of words that appear in the first text, each of the words having associated therewith a number representing a number of times the associated word appears in the text;
computing a similarity between the first bag of words and at least one second bag of words, wherein the computing comprises, for each word in the first bag of words, determining a compare count comprising a minimum number of times the word appears in each of the first bag of words and the second bag of words and adding the compare count to a sum of counts, wherein the computed similarity comprises two times the sum of counts divided by the total number of words in the first bag of words and the second bag of words;
comparing the computed similarity with a threshold; and
determining that the user is a spammer and preventing the user from using the online service to create additional texts if the computed similarity is greater than the threshold,wherein the first text comprises a user profile of the user in connection with the online service.
2 Assignments
0 Petitions
Accused Products
Abstract
A method is provided in one example embodiment and includes processing a first text created by a user into a first bag of words, the first bag of words comprising a list of words that appear in the text, each of the words having associated therewith a number representing a number of times the associated word appears in the text; and computing a similarity between the first bag of words and at least one second bag of words. The method further comprises comparing the computed similarity with a threshold; and_determining that the user is a spammer if the computed similarity bears a first relationship with the threshold.
7 Citations
17 Claims
-
1. A method comprising:
-
processing a first text created by a user using an online service into a first bag of words, the first bag of words comprising a list of words that appear in the first text, each of the words having associated therewith a number representing a number of times the associated word appears in the text; computing a similarity between the first bag of words and at least one second bag of words, wherein the computing comprises, for each word in the first bag of words, determining a compare count comprising a minimum number of times the word appears in each of the first bag of words and the second bag of words and adding the compare count to a sum of counts, wherein the computed similarity comprises two times the sum of counts divided by the total number of words in the first bag of words and the second bag of words; comparing the computed similarity with a threshold; and determining that the user is a spammer and preventing the user from using the online service to create additional texts if the computed similarity is greater than the threshold, wherein the first text comprises a user profile of the user in connection with the online service. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. Logic encoded in one or more non-transitory tangible media that includes code for execution and when executed by a processor is operable to perform operations comprising:
-
processing a first text created by a user using an online service into a first bag of words, the first bag of words comprising a list of words that appear in the text, each of the words having associated therewith a number representing a number of times the associated word appears in the text; computing a similarity between the first bag of words and at least one second bag of words, wherein the computing comprises, for each word in the first bag of words, determining a compare count comprising a minimum number of times the word appears in each of the first bag of words and the second bag of words and adding the compare count to a sum of counts, wherein the computed similarity comprises two times the sum of counts divided by the total number of words in the first bag of words and the second bag of words; comparing the computed similarity with a threshold; and determining that the user is a spammer and preventing the user from using the online service to create additional texts if the computed similarity is greater than the threshold, wherein the first text comprises a user profile of the user in connection with the online service. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. An apparatus, comprising:
-
a server that includes a processor and a memory, wherein the apparatus is configured to; process a first text created by a user using an online service into a first bag of words, the first bag of words comprising a list of words that appear in the text, each of the words having associated therewith a number representing a number of times the associated word appears in the text; compute a similarity between the first bag of words and at least one second bag of words, wherein the computing comprises, for each word in the first bag of words, determining a compare count comprising a minimum number of times the word appears in each of the first bag of words and the second bag of words and adding the compare count to a sum of counts, wherein the computed similarity comprises two times the sum of counts divided by the total number of words in the first bag of words and the second bag of words; compare the computed similarity with a threshold; and determine that the user is a spammer and preventing the user from using the online service to create additional texts if the computed similarity is greater than the threshold, wherein the first text comprises a user profile of the user in connection with the online service. - View Dependent Claims (14, 15, 16, 17)
-
Specification