Methods and systems for detecting unwanted web contents
First Claim
Patent Images
1. A method of detecting unwanted web contents, the method to be performed by a first computer and a second computer that each comprises a processor and a memory, the method comprising:
- the first computer receiving a first web page from a first website;
the first computer extracting a plurality of hypertext markup language (HTML) tags from the first web page;
the first computer generating page structure traits of the first web page by forming the plurality of HTML tags together into a pattern that comprises the plurality of HTML tags;
the first computer comparing the page structure traits of the first web page to page structure traits of a normal web page;
to prevent false positives, the first computer removing from the page structures of the first web page a feature that makes the page structure traits of the normal web page match the page structure traits of the first web page;
the second computer receiving the page structure traits of the first web page after the feature has been removed from the page structure traits of the first web page; and
the second computer detecting unwanted web content in a second web page received from a second website by comparing page structure traits of the second web page against the page structure traits of the first web page.
1 Assignment
0 Petitions
Accused Products
Abstract
Unwanted web contents are detected in an endpoint computer. The endpoint computer receives a web page from a website. The reputation of the website is determined and the web page is scanned for malicious codes to protect the endpoint computer from web threats. To further protect the endpoint computer from web threats including mutating unwanted web contents, page structure traits of the web page are generated and compared to page structure traits of other web pages detected to contain unwanted web contents.
45 Citations
13 Claims
-
1. A method of detecting unwanted web contents, the method to be performed by a first computer and a second computer that each comprises a processor and a memory, the method comprising:
-
the first computer receiving a first web page from a first website; the first computer extracting a plurality of hypertext markup language (HTML) tags from the first web page; the first computer generating page structure traits of the first web page by forming the plurality of HTML tags together into a pattern that comprises the plurality of HTML tags; the first computer comparing the page structure traits of the first web page to page structure traits of a normal web page; to prevent false positives, the first computer removing from the page structures of the first web page a feature that makes the page structure traits of the normal web page match the page structure traits of the first web page; the second computer receiving the page structure traits of the first web page after the feature has been removed from the page structure traits of the first web page; and the second computer detecting unwanted web content in a second web page received from a second website by comparing page structure traits of the second web page against the page structure traits of the first web page. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of generating page structure traits for detecting unwanted web contents, the method to be performed by a server computer comprising a processor and a memory, the method comprising:
-
the server computer retrieving web pages from a plurality of websites; the server computer analyzing the web pages for unwanted web contents; the server computer generating malicious page structure traits of web pages detected to have unwanted web contents based on the analysis of the web pages, each of the malicious page structure traits comprising a plurality of hypertext markup language (HTML) tags that are extracted from a corresponding web page and formed together into a pattern and matching page structure traits of a plurality of web pages; the server computer comparing the malicious page structure traits to normal page structure traits of normal web pages; to prevent false positives, the server computer removing from the malicious page structure traits features that make the normal page structure traits of normal web pages match the malicious page structure traits; after removing the features from the malicious page structure traits, the server computer providing the malicious page structure traits to a plurality of endpoint computers; and in the plurality of endpoint computers, comparing the malicious page structure traits to page structure traits of other web pages to detect unwanted web contents in the other web pages. - View Dependent Claims (12, 13)
-
Specification