Systems and methods for detecting suspicious web pages
First Claim
1. A computer-implemented method for detecting suspicious web pages, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
- identifying a plurality of malicious web pages;
training a classification model for identifying suspicious web pages that comprises one or more classification algorithms using;
features of the plurality of malicious web pages;
features of a web-page link graph that comprises at least;
a plurality of nodes, wherein each node within the plurality of nodes represents one of the plurality of malicious web pages;
a plurality of edges that join the plurality of nodes and that represent links between web pages represented within the web-page link graph, wherein the one or more classification algorithms are configured to classify web pages as suspicious;
identifying a website after the classification model has been trained;
classifying a first web page of the website and a second web page of the website as suspicious using the classification model;
determining that a probability of maliciousness of the first web page is greater than a probability of maliciousness of the second web page;
in response to classifying the first web page and the second web page as suspicious and based at least in part on the probability of maliciousness of the first web page being greater than the probability of maliciousness of the second web page;
selectively applying heavy analysis to the first web page and the second web page in order to conserve system resources of a monitored computer environment by;
executing the first web page within the monitored computer environment to determine whether the first web page is malicious;
refraining from executing the second web page within the monitored computer environment to determine whether the second web page is malicious;
detecting a malicious behavior of the first web page resulting from executing the first web page;
classifying the website as malicious based on detecting the malicious behavior of the first web page;
when the website is classified as malicious, updating the classification model by updating the web-page link graph and the one or more classification algorithms based at least in part on the website having been classified as malicious.
6 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method for detecting suspicious web pages. The method may include 1) identifying a plurality of malicious web pages; 2) establishing a classification model for identifying suspicious web pages, the classification model being based at least in part on the plurality of malicious web pages; 3) identifying an additional web page; 4) classifying the additional web page as suspicious using the classification model; 5) analyzing the additional web page to determine whether the additional web page is malicious; 6) determining that the additional web page is malicious based on the analysis; and 7) updating the classification model based at least in part on the determination. Various other methods, systems, and computer-readable media are also disclosed.
76 Citations
20 Claims
-
1. A computer-implemented method for detecting suspicious web pages, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
-
identifying a plurality of malicious web pages; training a classification model for identifying suspicious web pages that comprises one or more classification algorithms using; features of the plurality of malicious web pages; features of a web-page link graph that comprises at least; a plurality of nodes, wherein each node within the plurality of nodes represents one of the plurality of malicious web pages; a plurality of edges that join the plurality of nodes and that represent links between web pages represented within the web-page link graph, wherein the one or more classification algorithms are configured to classify web pages as suspicious; identifying a website after the classification model has been trained; classifying a first web page of the website and a second web page of the website as suspicious using the classification model; determining that a probability of maliciousness of the first web page is greater than a probability of maliciousness of the second web page; in response to classifying the first web page and the second web page as suspicious and based at least in part on the probability of maliciousness of the first web page being greater than the probability of maliciousness of the second web page; selectively applying heavy analysis to the first web page and the second web page in order to conserve system resources of a monitored computer environment by; executing the first web page within the monitored computer environment to determine whether the first web page is malicious; refraining from executing the second web page within the monitored computer environment to determine whether the second web page is malicious; detecting a malicious behavior of the first web page resulting from executing the first web page; classifying the website as malicious based on detecting the malicious behavior of the first web page; when the website is classified as malicious, updating the classification model by updating the web-page link graph and the one or more classification algorithms based at least in part on the website having been classified as malicious. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for detecting suspicious web pages, the system comprising:
-
an identification module programmed to identify a plurality of malicious web pages; an establishing module programmed to train a classification model for identifying suspicious web pages that comprises one or more classification algorithms using; features of the plurality of malicious web pages; features of a web-page link graph that comprises at least; a plurality of nodes, wherein each node within the plurality of nodes represents one of the plurality of malicious web pages; a plurality of edges that join the plurality of nodes and that represent links between web pages represented within the web-page link graph, wherein; the one or more classification algorithms are configured to classify web pages as suspicious; the identification module is further programmed to identify, after the classification model has been trained, a website; a classification module programmed to classify a first web page of the website and a second web page of the website as suspicious using the classification model; an analyzation module programmed to; determine, when the first web page and the second web page are classified as suspicious, that a probability of maliciousness of the first web page is greater than a probability of maliciousness of the second web page; selectively apply, based at least in part on the probability of maliciousness of the first web page being greater than the probability of maliciousness of the second web page, heavy analysis to the first web page and the second web page in order to conserve system resources of a monitored computer environment by; executing the first web page within the monitored computer environment to determine whether the first web page is malicious; refraining from executing the second web page within the monitored computer environment to determine whether the second web page is malicious; detect a malicious behavior of the first web page resulting from executing the first web page; a determination module programmed to classify the website as malicious based on the detected malicious behavior; an updating module programmed to update, when the website is classified as malicious, the classification model by updating the web-page link graph and the one or more classification algorithms based at least in part on the website having been classified as malicious; at least one hardware processor configured to execute the identification module, the establishing module, the classification module, the analyzation module, the determination module, and the updating module. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A non-transitory computer-readable-storage medium comprising one or more computer-executable instructions that, when executed by a computing device, cause the computing device to:
-
identify a plurality of malicious web pages; train a classification model for identifying suspicious web pages that comprises one or more classification algorithms using; features of the plurality of malicious web pages; features of a web-page link graph that comprises at least; a plurality of nodes, wherein each node within the plurality of nodes represents one of the plurality of malicious web pages; a plurality of edges that join the plurality of nodes and that represent links between web pages represented within the web-page link graph, wherein the one or more classification algorithms are configured to classify web pages as suspicious; identify a website after the classification model has been trained; classify a first web page of the website and a second web page of the website as suspicious using the classification model; determine that a probability of maliciousness of the first web page is greater than a probability of maliciousness of the second web page; in response to classifying the first web page and the second web page as suspicious and based at least in part on the probability of maliciousness of the first web page being greater than the probability of maliciousness of the second web page; selectively apply heavy analysis to the first web page and the second web page in order to conserve system resources of a monitored computer environment by; executing the first web page within the monitored computer environment to determine whether the first web page is malicious; refraining from executing the second web page within the monitored computer environment to determine whether the second web page is malicious; detect a malicious behavior of the first web page resulting from executing the first web page; classify the website as malicious based on detecting the malicious behavior; when the website is classified as malicious, update the classification model by updating the web-page link graph and the one or more classification algorithms based at least in part on the website having been classified as malicious.
-
Specification