Filter-based identification of malicious websites
First Claim
1. A method of identifying malicious websites, the method comprising:
- identifying a candidate suspicious website;
identifying a plurality of lightweight features associated with the candidate suspicious website;
identifying a dataset comprising a plurality of lightweight features associated with a plurality of known malicious websites and a plurality of lightweight features associated with a plurality of known innocuous websites;
generating a filter classifier comprising a statistical model including weights for the plurality of lightweight features associated with the plurality of known malicious websites and the plurality of lightweight features associated with the plurality of known innocuous websites that distinguish the plurality of known malicious websites from the plurality of known innocuous websites;
determining, with the weights of the generated filter classifier, a continuous filter score for the candidate suspicious website based on the plurality of lightweight features associated with the candidate suspicious website, the continuous filter score indicating similarity between the lightweight features associated with the candidate suspicious website and the lightweight features of the known malicious websites;
prioritizing a scan of the candidate suspicious website relative to other candidate suspicious websites in response to the continuous filter score for the candidate suspicious website and continuous filter scores for the other candidate suspicious websites;
determining whether the candidate suspicious website is a malicious website responsive at least in part to the scan;
updating, in response to determining that the suspicious website is a malicious website, the plurality of lightweight features associated with the plurality of known malicious websites in the dataset to include the plurality of lightweight features associated with the suspicious website; and
re-generating the filter classifier to update the statistical model to include at least one modified weight for the plurality of lightweight features based on the updated dataset.
2 Assignments
0 Petitions
Accused Products
Abstract
A candidate suspicious website is identified. A plurality of lightweight features associated with the candidate suspicious website is identified. A filter score is determined based on the plurality of lightweight features, wherein the filter score indicates a likelihood that the candidate suspicious website is a malicious website. Whether the filter score exceeds a threshold is determined. Responsive at least in part to the filter score exceeding the threshold it is determined that the candidate suspicious website is a suspicious website. Whether the suspicious website is a malicious website is determined by identifying software downloaded to the computing system responsive to accessing the suspicious website and determining whether the software downloaded to the computing system is malware based on characteristics associated with the downloaded software.
-
Citations
16 Claims
-
1. A method of identifying malicious websites, the method comprising:
-
identifying a candidate suspicious website; identifying a plurality of lightweight features associated with the candidate suspicious website; identifying a dataset comprising a plurality of lightweight features associated with a plurality of known malicious websites and a plurality of lightweight features associated with a plurality of known innocuous websites; generating a filter classifier comprising a statistical model including weights for the plurality of lightweight features associated with the plurality of known malicious websites and the plurality of lightweight features associated with the plurality of known innocuous websites that distinguish the plurality of known malicious websites from the plurality of known innocuous websites; determining, with the weights of the generated filter classifier, a continuous filter score for the candidate suspicious website based on the plurality of lightweight features associated with the candidate suspicious website, the continuous filter score indicating similarity between the lightweight features associated with the candidate suspicious website and the lightweight features of the known malicious websites; prioritizing a scan of the candidate suspicious website relative to other candidate suspicious websites in response to the continuous filter score for the candidate suspicious website and continuous filter scores for the other candidate suspicious websites; determining whether the candidate suspicious website is a malicious website responsive at least in part to the scan; updating, in response to determining that the suspicious website is a malicious website, the plurality of lightweight features associated with the plurality of known malicious websites in the dataset to include the plurality of lightweight features associated with the suspicious website; and re-generating the filter classifier to update the statistical model to include at least one modified weight for the plurality of lightweight features based on the updated dataset. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer system for identifying malicious websites, the system comprising:
-
a non-transitory computer-readable storage medium storing executable computer program instructions comprising; a web crawler module adapted to; identify a candidate suspicious website; and identify a plurality of lightweight features associated with the candidate suspicious website; a filter module adapted to; identify a dataset comprising a plurality of lightweight features associated with a plurality of known malicious websites and a plurality of lightweight features associated with a plurality of known innocuous websites; generate a filter classifier comprising a statistical model including weights for the plurality of lightweight features associated with the plurality of known malicious websites and the plurality of lightweight features associated with the plurality of known innocuous websites that distinguish the plurality of known malicious websites from the plurality of known innocuous websites; and determine, with the weights of the generated filter classifier, a continuous filter score for the candidate suspicious website based on the plurality of lightweight features associated with the candidate suspicious website, the continuous filter score indicating similarity between the lightweight features associated with the candidate suspicious website and the lightweight features of the known malicious websites; a malicious website scanning module adapted to; prioritize a scan of the candidate suspicious website relative to other candidate suspicious websites in response to the continuous filter score for the candidate suspicious website and continuous filter scores for the other candidate suspicious websites; and determine whether the candidate suspicious website is a malicious website responsive at least in part to the scan; wherein the filter module is further adapted to; update, in response to determining that the suspicious website is a malicious website, the plurality of lightweight features associated with the plurality of known malicious websites in the dataset to include the plurality of lightweight features associated with the suspicious website, and wherein the filter classifier is re-generated to update the statistical model to include at least one modified weight for the plurality of lightweight features based on the updated dataset; and a processor for executing the computer program instructions. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage medium encoded with executable program code for identifying malicious websites, the program code comprising program code for:
-
identifying a candidate suspicious website; identifying a plurality of lightweight features associated with the candidate suspicious website; identifying a dataset comprising a plurality of lightweight features associated with a plurality of known malicious websites and a plurality of lightweight features associated with a plurality of known innocuous websites; generating a filter classifier comprising a statistical model including weights for the plurality of lightweight features associated with the plurality of known malicious websites and the plurality of lightweight features associated with the plurality of known innocuous websites that distinguish the plurality of known malicious websites from the plurality of known innocuous websites; determining, with the weights of the generated filter classifier, a continuous filter score for the candidate suspicious website based on the plurality of lightweight features associated with the candidate suspicious website, the continuous filter score indicating similarity between the lightweight features associated with the candidate suspicious website and the lightweight features of the known malicious websites; prioritizing a scan of the candidate suspicious website relative to other candidate suspicious websites in response to the continuous filter score for the candidate suspicious website and continuous filter scores for the other candidate suspicious websites; determining whether the candidate suspicious website is a malicious website responsive at least in part to the scan; updating, in response to determining that the suspicious website is a malicious website, the plurality of lightweight features associated with the plurality of known malicious websites in the dataset to include the plurality of lightweight features associated with the suspicious website; and re-generating the filter classifier to update the statistical model to include at least one modified weight for the plurality of lightweight features based on the updated dataset. - View Dependent Claims (14, 15, 16)
-
Specification