Systems and methods for parsing user-generated content to prevent attacks
First Claim
1. A method for parsing a token stream symbolizing user generated content, using a computer implemented security system, the method comprising:
- removing tokens, using a processor, from the token stream to generate a sanitized token stream, wherein the removal of tokens is performed by;
iterating over the token stream while filtering for nodes that are hypertext markup language tags, and cross referencing the tag against a whitelist;
if the tag is in the whitelist, then iterating through the attributes of the tag and cross referencing the attributes against the whitelist;
iterating through protocol-based hypertext markup language attributes to identify a valid URL, and cross referencing the valid URL with the whitelist;
iterating through cascade style sheet selectors within <
style> and
<
link>
tags and cross referencing the cascade style sheet selector with the whitelist;
if the cascade style sheet selector is in the whitelist, then iterating through properties for the cascade style sheet selector in <
style>
/<
link>
tags or as “
style”
attributes on a specific hypertext markup language tag, and cross referencing the properties against the whitelist; and
removing any token which is not found in the whitelist when cross-referenced.
10 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to systems and methods for parsing of a token stream for user generated content in order to prevent attacks on the user generated content. The systems and methods include a database which stores one or more whitelists, and a parser. The parser removes tokens from the token stream by comparing the tokens against the whitelist. Next, the parser validates CSS property values, encodes data within attribute values and text nodes, reconciles closing HTML tags, and coerces media tags into safe variants. The tokens removed may be any of HTML tags, HTML attributes, HTML protocols, CSS selectors and CSS properties.
-
Citations
18 Claims
-
1. A method for parsing a token stream symbolizing user generated content, using a computer implemented security system, the method comprising:
removing tokens, using a processor, from the token stream to generate a sanitized token stream, wherein the removal of tokens is performed by; iterating over the token stream while filtering for nodes that are hypertext markup language tags, and cross referencing the tag against a whitelist;
if the tag is in the whitelist, then iterating through the attributes of the tag and cross referencing the attributes against the whitelist;iterating through protocol-based hypertext markup language attributes to identify a valid URL, and cross referencing the valid URL with the whitelist; iterating through cascade style sheet selectors within <
style> and
<
link>
tags and cross referencing the cascade style sheet selector with the whitelist;
if the cascade style sheet selector is in the whitelist, then iterating through properties for the cascade style sheet selector in <
style>
/<
link>
tags or as “
style”
attributes on a specific hypertext markup language tag, and cross referencing the properties against the whitelist; andremoving any token which is not found in the whitelist when cross-referenced. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
10. A security system for parsing a token stream symbolizing user generated content comprising:
-
a database, stored in a memory storage device, including a whitelist; and a parser configured by a processor device to generate a sanitized token stream by; iterating over the token stream while filtering for nodes that are hypertext markup language tags, and cross referencing the tag against the whitelist;
if the tag is in the whitelist, then iterating through the attributes of the tag and cross referencing the attributes against the whitelist;iterating through protocol-based hypertext markup language attributes to identify a valid URL, and cross referencing the valid URL with the whitelist; iterating through cascade style sheet selectors within and tags and cross referencing the cascade style sheet selector with the whitelist;
if the cascade style sheet selector is in the whitelist, then iterating through properties for the cascade style sheet selector in <
style>
/<
link>
tags or as “
style”
attributes on a specific hypertext markup language tag, and cross referencing the properties against the whitelist; andremoving any token which is not found in the whitelist when cross-referenced. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification