Systems and methods for detecting suspicious web pages

US 9,356,941 B1
Filed: 08/16/2010
Issued: 05/31/2016
Est. Priority Date: 08/16/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for detecting suspicious web pages, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:

identifying a plurality of malicious web pages;

training a classification model for identifying suspicious web pages that comprises one or more classification algorithms using;

features of the plurality of malicious web pages;

features of a web-page link graph that comprises at least;

a plurality of nodes, wherein each node within the plurality of nodes represents one of the plurality of malicious web pages;

a plurality of edges that join the plurality of nodes and that represent links between web pages represented within the web-page link graph, wherein the one or more classification algorithms are configured to classify web pages as suspicious;

identifying a website after the classification model has been trained;

classifying a first web page of the website and a second web page of the website as suspicious using the classification model;

determining that a probability of maliciousness of the first web page is greater than a probability of maliciousness of the second web page;

in response to classifying the first web page and the second web page as suspicious and based at least in part on the probability of maliciousness of the first web page being greater than the probability of maliciousness of the second web page;

selectively applying heavy analysis to the first web page and the second web page in order to conserve system resources of a monitored computer environment by;

executing the first web page within the monitored computer environment to determine whether the first web page is malicious;

refraining from executing the second web page within the monitored computer environment to determine whether the second web page is malicious;

detecting a malicious behavior of the first web page resulting from executing the first web page;

classifying the website as malicious based on detecting the malicious behavior of the first web page;

when the website is classified as malicious, updating the classification model by updating the web-page link graph and the one or more classification algorithms based at least in part on the website having been classified as malicious.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method for detecting suspicious web pages. The method may include 1) identifying a plurality of malicious web pages; 2) establishing a classification model for identifying suspicious web pages, the classification model being based at least in part on the plurality of malicious web pages; 3) identifying an additional web page; 4) classifying the additional web page as suspicious using the classification model; 5) analyzing the additional web page to determine whether the additional web page is malicious; 6) determining that the additional web page is malicious based on the analysis; and 7) updating the classification model based at least in part on the determination. Various other methods, systems, and computer-readable media are also disclosed.

76 Citations

View as Search Results

20 Claims

1. A computer-implemented method for detecting suspicious web pages, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
- identifying a plurality of malicious web pages;
  
  training a classification model for identifying suspicious web pages that comprises one or more classification algorithms using;
  
  features of the plurality of malicious web pages;
  
  features of a web-page link graph that comprises at least;
  
  a plurality of nodes, wherein each node within the plurality of nodes represents one of the plurality of malicious web pages;
  
  a plurality of edges that join the plurality of nodes and that represent links between web pages represented within the web-page link graph, wherein the one or more classification algorithms are configured to classify web pages as suspicious;
  
  identifying a website after the classification model has been trained;
  
  classifying a first web page of the website and a second web page of the website as suspicious using the classification model;
  
  determining that a probability of maliciousness of the first web page is greater than a probability of maliciousness of the second web page;
  
  in response to classifying the first web page and the second web page as suspicious and based at least in part on the probability of maliciousness of the first web page being greater than the probability of maliciousness of the second web page;
  
  selectively applying heavy analysis to the first web page and the second web page in order to conserve system resources of a monitored computer environment by;
  
  executing the first web page within the monitored computer environment to determine whether the first web page is malicious;
  
  refraining from executing the second web page within the monitored computer environment to determine whether the second web page is malicious;
  
  detecting a malicious behavior of the first web page resulting from executing the first web page;
  
  classifying the website as malicious based on detecting the malicious behavior of the first web page;
  
  when the website is classified as malicious, updating the classification model by updating the web-page link graph and the one or more classification algorithms based at least in part on the website having been classified as malicious.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The computer-implemented method of claim 1, wherein classifying the first web page as suspicious using the classification model comprises classifying the first web page as suspicious based on at least one of the following:
    - a content feature of the first web page;
      
      a uniform resource locator feature of the first web page;
      
      a link between the first web page and the plurality of malicious web pages, wherein the link is represented by an edge within the web-page link graph.
  - 3. The computer-implemented method of claim 2, wherein:
    - classifying the first web page as suspicious using the classification model comprises classifying the first web page as suspicious based on the link between the first web page and the plurality of malicious web pages;
      
      classifying the first web page as suspicious based on the link between the first web page and the plurality of malicious web pages comprises;
      
      using the web-page link graph to identify a set of direct links between the first web page and the plurality of malicious web pages;
      
      using the web-page link graph to identify a set of indirect links between the first web page and the plurality of malicious web pages;
      
      calculating a suspicious link score for the first web page based on the set of direct links and the set of indirect links;
      
      classifying the first web page as suspicious based at least in part on the suspicious link score.
  - 4. The computer-implemented method of claim 1, wherein:
    - the one or more classification algorithms comprise a set of classifiers;
      
      training the classification model for identifying suspicious web pages comprises training the set of classifiers, wherein each classifier in the set of classifiers is configured to independently classify the first web page.
  - 5. The computer-implemented method of claim 4, wherein classifying the first web page as suspicious using the classification model comprises:
    - determining, for each classifier in the set of classifiers, a classification for the first web page;
      
      combining the classifications of each classifier in the set of classifiers.
  - 6. The computer-implemented method of claim 5, wherein updating the one or more classification algorithms comprises:
    - generating an additional classifier based at least in part on the first web page and the web-page link graph;
      
      adding the additional classifier to the set of classifiers.
  - 7. The computer-implemented method of claim 6, wherein generating the additional classifier based at least in part on the first web page comprises:
    - identifying a set of newly classified malicious web pages;
      
      adding the first web page to the set of newly classified malicious web pages;
      
      identifying a set of malicious features, wherein each malicious feature in the set of malicious features comprises a feature of at least one web page in the set of newly classified malicious web pages;
      
      generating the additional classifier based at least in part on the set of malicious features.
  - 8. The computer-implemented method of claim 4, further comprising:
    - periodically identifying an expired classifier;
      
      removing the expired classifier from the set of classifiers.
  - 9. The computer-implemented method of claim 1, wherein updating the one or more classification algorithms comprises adapting at least one of the one or more classification algorithms based at least in part on the first web page.
  - 10. The computer-implemented method of claim 1, wherein:
    - training the classification model for identifying suspicious web pages comprises configuring the classification model to determine the probability that the first web page is malicious;
      
      classifying the first web page as suspicious using the classification model comprises using the classification model to determine that the probability that the first web page is malicious is above a predetermined threshold.
  - 11. The computer-implemented method of claim 1, wherein updating the one or more classification algorithms comprises using the first web page to retrain at least one of the one or more classification algorithms.
  - 12. The computer-implemented method of claim 1, further comprising classifying the first web page as malicious based on detecting the malicious behavior.
  - 13. The computer-implemented method of claim 1, wherein identifying the website comprises:
    - crawling the website;
      
      adding an additional node to the web-page link graph for each web page of the website;
      
      adding an additional edge to the web-page link graph for each link between a web page of the website and a web page in the plurality of malicious web pages.

14. A system for detecting suspicious web pages, the system comprising:
- an identification module programmed to identify a plurality of malicious web pages;
  
  an establishing module programmed to train a classification model for identifying suspicious web pages that comprises one or more classification algorithms using;
  
  features of the plurality of malicious web pages;
  
  features of a web-page link graph that comprises at least;
  
  a plurality of nodes, wherein each node within the plurality of nodes represents one of the plurality of malicious web pages;
  
  a plurality of edges that join the plurality of nodes and that represent links between web pages represented within the web-page link graph, wherein;
  
  the one or more classification algorithms are configured to classify web pages as suspicious;
  
  the identification module is further programmed to identify, after the classification model has been trained, a website;
  
  a classification module programmed to classify a first web page of the website and a second web page of the website as suspicious using the classification model;
  
  an analyzation module programmed to;
  
  determine, when the first web page and the second web page are classified as suspicious, that a probability of maliciousness of the first web page is greater than a probability of maliciousness of the second web page;
  
  selectively apply, based at least in part on the probability of maliciousness of the first web page being greater than the probability of maliciousness of the second web page, heavy analysis to the first web page and the second web page in order to conserve system resources of a monitored computer environment by;
  
  executing the first web page within the monitored computer environment to determine whether the first web page is malicious;
  
  refraining from executing the second web page within the monitored computer environment to determine whether the second web page is malicious;
  
  detect a malicious behavior of the first web page resulting from executing the first web page;
  
  a determination module programmed to classify the website as malicious based on the detected malicious behavior;
  
  an updating module programmed to update, when the website is classified as malicious, the classification model by updating the web-page link graph and the one or more classification algorithms based at least in part on the website having been classified as malicious;
  
  at least one hardware processor configured to execute the identification module, the establishing module, the classification module, the analyzation module, the determination module, and the updating module.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The system of claim 14, wherein the classification module classifies the first web page as suspicious using the classification model by classifying the first web page as suspicious based on at least one of the following:
    - a content feature of the first web page;
      
      a uniform resource locator feature of the first web page;
      
      a link between the first web page and the plurality of malicious web pages, wherein the link is represented by an edge within the web-page link graph.
  - 16. The system of claim 15, wherein:
    - the classification module classifies the first web page as suspicious using the classification model by classifying the first web page as suspicious based on the link between the first web page and the plurality of malicious web pages;
      
      the classification module classifies the first web page as suspicious based on the link between the first web page and the plurality of malicious web pages by;
      
      using the web-page link graph to identify a set of direct links between the first web page and the plurality of malicious web pages;
      
      using the web-page link graph to identify a set of indirect links between the first web page and the plurality of malicious web pages;
      
      calculating a suspicious link score for the first web page based on the set of direct links and the set of indirect links;
      
      classifying the first web page as suspicious based at least in part on the suspicious link score.
  - 17. The system of claim 14, wherein:
    - the one or more classification algorithms comprise a set of classifiers;
      
      the establishing module trains the classification model for identifying suspicious web pages by training the set of classifiers, wherein each classifier in the set of classifiers is configured to independently classify the first web page.
  - 18. The system of claim 17, wherein the classification module classifies the first web page as suspicious using the classification model by:
    - determining, for each classifier in the set of classifiers, a classification for the first web page;
      
      combining the classifications of each classifier in the set of classifiers.
  - 19. The system of claim 18, wherein the updating module updates the one or more classification algorithms by:
    - generating an additional classifier based at least in part on the first web page and the web-page link graph;
      
      adding the additional classifier to the set of classifiers.

20. A non-transitory computer-readable-storage medium comprising one or more computer-executable instructions that, when executed by a computing device, cause the computing device to:
- identify a plurality of malicious web pages;
  
  train a classification model for identifying suspicious web pages that comprises one or more classification algorithms using;
  
  features of the plurality of malicious web pages;
  
  features of a web-page link graph that comprises at least;
  
  a plurality of nodes, wherein each node within the plurality of nodes represents one of the plurality of malicious web pages;
  
  a plurality of edges that join the plurality of nodes and that represent links between web pages represented within the web-page link graph, wherein the one or more classification algorithms are configured to classify web pages as suspicious;
  
  identify a website after the classification model has been trained;
  
  classify a first web page of the website and a second web page of the website as suspicious using the classification model;
  
  determine that a probability of maliciousness of the first web page is greater than a probability of maliciousness of the second web page;
  
  in response to classifying the first web page and the second web page as suspicious and based at least in part on the probability of maliciousness of the first web page being greater than the probability of maliciousness of the second web page;
  
  selectively apply heavy analysis to the first web page and the second web page in order to conserve system resources of a monitored computer environment by;
  
  executing the first web page within the monitored computer environment to determine whether the first web page is malicious;
  
  refraining from executing the second web page within the monitored computer environment to determine whether the second web page is malicious;
  
  detect a malicious behavior of the first web page resulting from executing the first web page;
  
  classify the website as malicious based on detecting the malicious behavior;
  
  when the website is classified as malicious, update the classification model by updating the web-page link graph and the one or more classification algorithms based at least in part on the website having been classified as malicious.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Gen Digital Inc.
Original Assignee
Symantec Corporation (NortonLifeLock Inc.)
Inventors
Kislyuk, Oleg, Gubin, Maxim, Vinnik, Alex
Primary Examiner(s)
Zele, Krista
Assistant Examiner(s)
Fabbri, Anthony

Application Number

US12/857,119
Time in Patent Office

2,115 Days
Field of Search

726/22
US Class Current

1/1
CPC Class Codes

H04L 63/14   for detecting or protecting...

H04L 63/1408   by monitoring network traff...

H04L 63/1416   Event detection, e.g. attac...

H04L 63/1441   Countermeasures against mal...

H04L 63/145   the attack involving the pr...

H04L 63/1483   service impersonation, e.g....

H04L 63/168   above the transport layer

Systems and methods for detecting suspicious web pages

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

76 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for detecting suspicious web pages

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

76 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links