Method and system for filtering website content
First Claim
1. A computer implemented method for filtering content submitted by a user for dissemination over a communication forum, the method comprising the steps of:
- (a) intercepting the content submitted by the user at the time of submission by the user to the communication forum;
(b) preprocessing a copy of said intercepted content through a preprocessing subroutine to yield a modified content by reducing said intercepted content to its least common denominator, wherein said preprocessing step further comprises the steps of;
(b1) analyzing said intercepted content for HTML tags, wherein when there are no HTML tags, performing steps (b2) through (b7), and when there are HTML tags, performing steps (b8) through (b12);
(b2) converting each white space to a space, wherein said white space is a one of a space, a tab, a return, an end of line character, and any other character that is displayed on a display device as said white space to a viewer;
(b3) removing each punctuation character at an end of a word, wherein said word is a string of characters;
(b4) converting each uppercase letter into a corresponding lowercase letter;
(b5) performing a character mapping on the results of said steps (b2), (b3), and (b4) of the intercepted content;
(b6) utilizing the results of said step (b5), changing a three or more of any consecutively repeated character to two of said consecutively repeated character or to a one of said consecutively repeated character based upon a predefined list; and
(b7) deleting any remaining spaces at the end of said intercepted content;
(b8) separating said HTML tags from a non-HTML text of said intercepted(b9) concatenating said non-HTML text with a space where said HTML tag was located in said intercepted content;
(b10) sending said concatenated non-HTML text to said converting step (b2) for continued processing;
(b11) copying a text inside said HTML tags to a file; and
(b12) processing said text inside each said HTML tags through steps (b2), (b4), and (b7);
(c) breaking said modified content down through a content breakdown subroutine into a plurality of strings of words, wherein each successive string of words drops the first word from the previous string of words;
(d) processing each of said plurality of strings of words through a recursive comparison subroutine to attempt to identify at least one undesirable term that matches a previously identified undesirable term stored in a secondary database of undesirable terms, wherein each of said previously identified undesirable terms is a word or a phrase; and
(e) when said at least one undesirable term is identified, blocking the content submitted by the user to the communication forum from appearing on the communication forum.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for filtering website content prevents undesirable words or phrases from appearing in website postings sent by website users. The invention intercepts all content submitted by the user, and processes the content before posting it on the website. Intercepted content is first processed through a blocking subroutine, which first calls a preprocessing subroutine and then calls a content breakdown subroutine. The content breakdown subroutine utilizes a recursive comparison subroutine to identify undesirable words or phrases against previously identified words or phrases stored in a database. Options may be set in the system to replace the inappropriate content with acceptable content and then post the message or block the message entirely. The user may or may not be notified that their message has been blocked or replaced. The program then calls a matching subroutine for further processing of the intercepted content.
-
Citations
45 Claims
-
1. A computer implemented method for filtering content submitted by a user for dissemination over a communication forum, the method comprising the steps of:
-
(a) intercepting the content submitted by the user at the time of submission by the user to the communication forum; (b) preprocessing a copy of said intercepted content through a preprocessing subroutine to yield a modified content by reducing said intercepted content to its least common denominator, wherein said preprocessing step further comprises the steps of; (b1) analyzing said intercepted content for HTML tags, wherein when there are no HTML tags, performing steps (b2) through (b7), and when there are HTML tags, performing steps (b8) through (b12); (b2) converting each white space to a space, wherein said white space is a one of a space, a tab, a return, an end of line character, and any other character that is displayed on a display device as said white space to a viewer; (b3) removing each punctuation character at an end of a word, wherein said word is a string of characters; (b4) converting each uppercase letter into a corresponding lowercase letter; (b5) performing a character mapping on the results of said steps (b2), (b3), and (b4) of the intercepted content; (b6) utilizing the results of said step (b5), changing a three or more of any consecutively repeated character to two of said consecutively repeated character or to a one of said consecutively repeated character based upon a predefined list; and (b7) deleting any remaining spaces at the end of said intercepted content; (b8) separating said HTML tags from a non-HTML text of said intercepted (b9) concatenating said non-HTML text with a space where said HTML tag was located in said intercepted content; (b10) sending said concatenated non-HTML text to said converting step (b2) for continued processing; (b11) copying a text inside said HTML tags to a file; and (b12) processing said text inside each said HTML tags through steps (b2), (b4), and (b7); (c) breaking said modified content down through a content breakdown subroutine into a plurality of strings of words, wherein each successive string of words drops the first word from the previous string of words; (d) processing each of said plurality of strings of words through a recursive comparison subroutine to attempt to identify at least one undesirable term that matches a previously identified undesirable term stored in a secondary database of undesirable terms, wherein each of said previously identified undesirable terms is a word or a phrase; and (e) when said at least one undesirable term is identified, blocking the content submitted by the user to the communication forum from appearing on the communication forum. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer implemented method for filtering content submitted by a user for dissemination over a communication forum, the method comprising the steps of:
-
(a) intercepting the content submitted by the user at the time of submission by the user to the communication forum; (b) preprocessing a copy of said intercepted content through a preprocessing subroutine to yield a modified content by reducing said intercepted content to its least common denominator; (c) breaking said modified content down through a content breakdown subroutine into a plurality of strings of words, wherein each successive string of words drops the first word from the previous string of words; (d) processing each of said plurality of strings of words through a recursive comparison subroutine to attempt to identify at least one undesirable term that matches a previously identified undesirable term stored in a secondary database of undesirable terms, wherein each of said previously identified undesirable terms is a word or a phrase; (e) when said at least one undesirable term is identified, blocking the content submitted by the user to the communication forum from appearing on the communication forum; (f) storing in a file the content intercepted from the user, a user ID for the user, and any OK indications, any undesirable terms, and any replacement string resulting from preprocessing step (b), breaking step (c), and processing step (d); (g) processing said intercepted content through a matching subroutine to identify new permutations of undesirable terms, wherein said processing said intercepted content step further comprises the steps of; (g1) repeating preprocessing step (b) for said intercepted content; (g2) removing any white space from said intercepted content remaining after said preprocessing step (b); (g3) processing said intercepted content through a matching breakdown subroutine to attempt to identify at least one matching phrase; (g4) when said at least one matching phrase has been identified, determining if a first of said at least one matching phrase is already stored in a database of terms; (g5) when said determining step (g4) result is yes, passing control to said determining step (g7) for continued processing; (g6) when said determining step (g4) result is no, entering said at least one matching phrase into said database of terms as a not reviewed term; (g7) determining if there is a next said at least one matching phrase; (g8) when said determining step (g7) result is yes, passing control to said determining step (g4) for said next said at least one matching phrase; and (g9) when said determining step (g7) result is no, passing control to said erasing step (h); and (h) erasing said file. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer system for filtering content submitted by a user for dissemination over a communication forum, the computer system comprising:
-
a software program, stored in a computer readable storage medium, that when loaded into a memory and executed by the computer system intercepts the content submitted by a user at the time of submission by the user through an input device for dissemination over a communication forum; a blocking subroutine of said software program for filtering said intercepted content; a preprocessing subroutine of said software program for preprocessing a copy of said intercepted content to yield a modified content which reduces said intercepted content to its least common denominator, wherein said preprocessing subroutine; analyzes said intercepted content for HTML tags, wherein when there are no HTML tags said preprocessing subroutine; converts each white space to a space, wherein said white space is a one of a space, a tab, a return, an end of line character, and any other character that is displayed on a display device as said white space to a viewer; removes each punctuation character at an end of a word, wherein said word is a string of characters; converts each uppercase letter into a corresponding lowercase letter; performs a character mapping on the intercepted content; changes a three or more of any consecutively repeated character to two of said consecutively repeated character or to a one of said consecutively repeated character based upon a predefined list; and deletes any remaining spaces at the end of said intercepted content; wherein when there are HTML tags said preprocessing subroutine; separates said HTML tags when present from a non-HTML text of said intercepted content; concatenates said non-HTML text with a space where said HTML tag was located in said intercepted content; sends said concatenated non-HTML text back to the beginning of said preprocessing subroutine for continued processing; copies a text inside said HTML tags to a file; and sends said text inside said HTML tags for simplified processing through said preprocessing subroutine by performing only a portion of the processing; a content breakdown subroutine of said software program for breaking said modified content down into a plurality of strings of words, wherein each successive string of words drops the first word from the previous string of words; a secondary database of undesirable terms accessed by said software program, wherein a list of previously identified undesirable terms are stored, and further wherein each of said previously identified undesirable terms is a word or a phrase; and a recursive comparison subroutine of said software program for processing recursively each of said plurality of strings of words to identify at least one undesirable term that matches a one of said previously identified undesirable terms stored in said secondary database of undesirable terms; wherein when said at least one undesirable term is identified, said blocking subroutine blocks the content submitted by the user to the communication forum from appearing on the communication forum. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform a method for filtering content submitted by a user for dissemination over a communication forum, the method comprising the steps of:
-
(a) intercepting the content submitted by the user at the time of submission by the user to the communication forum; (b) preprocessing a copy of said intercepted content through a preprocessing subroutine to yield a modified content by reducing said intercepted content to its least common denominator, wherein said preprocessing step further comprises the steps of; (b1) analyzing said intercepted content for HTML tags, wherein when there are no HTML tags, performing steps (b2) through (b7), and when there are HTML tags, performing steps (b8) through (b12); (b2) converting each white space to a space, wherein said white space is a one of a space, a tab, a return, an end of line character, and any other character that is displayed on a display device as said white space to a viewer; (b3) removing each punctuation character at an end of a word, wherein said word is a string of characters; (b4) converting each uppercase letter into a corresponding lowercase letter; (b5) performing a character mapping on the results of said steps (b2), (b3), and (b4) of the intercepted content; (b6) utilizing the results of said step (b5), changing a three or more of any consecutively repeated character to two of said consecutively repeated character or to a one of said consecutively repeated character based upon a predefined list; and (b7) deleting any remaining spaces at the end of said intercepted content; (b8) separating said HTML tags from a non-HTML text of said intercepted content; (b9) concatenating said non-HTML text with a space where said HTML tag was located in said intercepted content; (b10) sending said concatenated non-HTML text to said converting step (b2) for continued processing; (b11) copying a text inside said HTML tags to a file; and (b12) processing said text inside each said HTML tags through steps (b2), (b4), and (b7); (c) breaking said modified content down through a content breakdown subroutine into a plurality of strings of words, wherein each successive string of words drops the first word from the previous string of words; (d) processing each of said plurality of strings of words through a recursive comparison subroutine to attempt to identify at least one undesirable term that matches a previously identified undesirable term stored in a secondary database of undesirable terms, wherein each of said previously identified undesirable terms is a word or a phrase; and (e) when said at least one undesirable term is identified, blocking the content submitted by the user to the communication forum from appearing on the communication forum. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36)
-
-
37. A computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform a method for filtering content submitted by a user for dissemination over a communication forum, the method comprising the steps of:
-
(a) intercepting the content submitted by the user at the time of submission by the user to the communication forum; (b) preprocessing a copy of said intercepted content through a preprocessing subroutine to yield a modified content by reducing said intercepted content to its least common denominator; (c) breaking said modified content down through a content breakdown subroutine into a plurality of strings of words, wherein each successive string of words drops the first word from the previous string of words; (d) processing each of said plurality of strings of words through a recursive comparison subroutine to attempt to identify at least one undesirable term that matches a previously identified undesirable term stored in a secondary database of undesirable terms, wherein each of said previously identified undesirable terms is a word or a phrase; and (e) when said at least one undesirable term is identified, blocking the content submitted by the user to the communication forum from appearing on the communication forum; (f) storing in a file the content intercepted from the user, a user ID for the user, and any OK indications, any undesirable terms, and any replacement string resulting from preprocessing step (b), breaking step (c), and processing step (d); (g) processing said intercepted content through a matching subroutine to identify new permutations of undesirable terms, wherein said processing said intercepted content step further comprises the steps of; (g1) repeating preprocessing step (b) for said intercepted content; (g2) removing any white space from said intercepted content remaining after said preprocessing step (b); (g3) processing said intercepted content through a matching breakdown subroutine to attempt to identify at least one matching phrase; (g4) when said at least one matching phrase has been identified, determining if a first of said at least one matching phrase is already stored in a database of terms; (g5) when said determining step (g4) result is yes, passing control to said determining step (g7) for continued processing; (g6) when said determining step (g4) result is no, entering said at least one matching phrase into said database of terms as a not reviewed term; (g7) determining if there is a next said at least one matching phrase; (g8) when said determining step (g7) result is yes, passing control to said determining step (g4) for said next said at least one matching phrase; and (g9) when said determining step (g7) result is no, passing control to said erasing step (h); and (h) erasing said file. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45)
-
Specification