Systems and methods for tokenizing user-generated content to enable the prevention of attacks
First Claim
1. A method for tokenizing user-generated content, using a computer implemented security system, the method comprising:
- pre-processing a user-generated content input string utilizing a secondary input of target language, wherein the pre-preprocessing converts existing text into a token stream text node at the start of an HTML tag for insertion into a token stream; and
extracting tokens, using a processor, from the pre-processed user-generated content string to generate the token stream, wherein the token stream is yielded to a caller rather than the user-generated content to prevent attacks on the user-generated content;
wherein the extraction of tokens from the pre-processed user-generated content requires scanning the pre-processed user-generated content string by individual runes, and sending each rune to a specific buffer based upon signaling individual finite state machines.
10 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to systems and methods for the tokenization of user-generated content in order to prevent attacks on the user-generated content. The systems and methods initially pre-process the user-generated content string utilizing a secondary input of target language. Pre-processing may also include initialization of finite state machines, token markers and string buffers (text, HTML tag name, HTML attribute name, HTML attribute value, CSS selector, CSS property name, and CSS property value). The user-generated content string is scanned by rune, and the system sends each rune to a specific buffer based upon signaling by individual finite state machine states. Buffers are then converted to token stream nodes to be inserted into the token stream. The tokens represent a string of characters and are symbolically categorized according to activated finite state machine states.
-
Citations
18 Claims
-
1. A method for tokenizing user-generated content, using a computer implemented security system, the method comprising:
-
pre-processing a user-generated content input string utilizing a secondary input of target language, wherein the pre-preprocessing converts existing text into a token stream text node at the start of an HTML tag for insertion into a token stream; and extracting tokens, using a processor, from the pre-processed user-generated content string to generate the token stream, wherein the token stream is yielded to a caller rather than the user-generated content to prevent attacks on the user-generated content; wherein the extraction of tokens from the pre-processed user-generated content requires scanning the pre-processed user-generated content string by individual runes, and sending each rune to a specific buffer based upon signaling individual finite state machines. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A security system for tokenizing user-generated content comprising:
-
a pre-processor configured to process a user-generated content input string utilizing a secondary input of target language, wherein the pre-processor converts existing text into a token stream text node at the start of an HTML tag for insertion into a token stream; and a tokenizer, including a processor, configured to extract tokens from the pre-processed user-generated content string to generate the token stream, wherein the token stream is yielded to a caller rather than the user-generated content to prevent attacks on the user-generated content; wherein the tokenizer scans the pre-processed user-generated content by individual runes, and sends each rune to a specific buffer based upon signaling individual finite state machine states. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification