Spam flood detection methodologies
First Claim
Patent Images
1. A computer-implemented method comprising:
- analyzing characteristics of a content item of a webpage to determine whether the content item represents a potential source of spam content, wherein the analyzing comprises;
selecting, in accordance with predefined selection criteria, a plurality of hypertext markup language (HTML) tags of the webpage;
creating an ordered sequence of the selected plurality of HTML tags; and
applying a hash function to the ordered sequence of the selected plurality of HTML tags to map the selected plurality of HTML tags to a characterizing signature of the webpage;
updating an occurrence count for the characterizing signature in response to the characterizing signature being generated; and
when the occurrence count is greater than a threshold count, identifying the content item as spam content; and
flagging the content item of the webpage as spam content.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented method and system are provided in which characteristics of a website are analyzed to determine whether the website represents a potential source of spam content. The analysis can include generating a characterizing signature of a webpage containing a content item, and obtaining an occurrence count for the generated characterizing signature. The characterizing signature is derived from formatting data of the webpage. When the obtained occurrence count is greater than a threshold count, the content item can be identified as spam content, and flagged as spam content.
-
Citations
12 Claims
-
1. A computer-implemented method comprising:
-
analyzing characteristics of a content item of a webpage to determine whether the content item represents a potential source of spam content, wherein the analyzing comprises; selecting, in accordance with predefined selection criteria, a plurality of hypertext markup language (HTML) tags of the webpage; creating an ordered sequence of the selected plurality of HTML tags; and applying a hash function to the ordered sequence of the selected plurality of HTML tags to map the selected plurality of HTML tags to a characterizing signature of the webpage; updating an occurrence count for the characterizing signature in response to the characterizing signature being generated; and when the occurrence count is greater than a threshold count, identifying the content item as spam content; and flagging the content item of the webpage as spam content. - View Dependent Claims (2, 3, 4)
-
-
5. A computing system comprising a processor and a memory having computer-executable instructions stored thereon that, when executed by the processor, cause the computing system to:
-
analyze characteristics of a content item of a webpage to determine whether the content item represents a potential source of spam content by causing the computing system to; select, in accordance with predefined selection criteria, a plurality of hypertext markup language (HTML) tags of the webpage; create an ordered sequence of the selected plurality of HTML tags; map the ordered sequence of the selected plurality of HTML tags to a characterizing signature of a webpage by applying a hash function to the ordered sequence of the selected plurality of HTML tags to obtain a hash value that corresponds to the characterizing signature; update an occurrence count for the characterizing signature in response to the characterizing signature being generated; and when the occurrence count is greater than a threshold count, identify the content item as spam content; and flag the content item of the webpage as spam content. - View Dependent Claims (6, 7, 8)
-
-
9. A tangible and non-transitory computer readable medium having computer-executable instructions stored thereon that, when executed by a processor, perform a method comprising:
-
analyzing characteristics of a content item of a webpage to determine whether the content item represents a potential source of spam content, wherein the analyzing comprises; selecting, in accordance with predefined selection criteria, a plurality of hypertext markup language (HTML) tags of a webpage; creating an ordered sequence of the selected plurality of HTML tags; and applying a hash function to the ordered sequence of the selected plurality of HTML tags to map the selected plurality of HTML tags to a characterizing signature of the webpage; updating an occurrence count for the characterizing signature in response to the characterizing signature being generated; and when the occurrence count is greater than a threshold count, identifying the content item as spam content; and flagging the content item of the webpage as spam content. - View Dependent Claims (10, 11, 12)
-
Specification