System and method of analyzing web content

US 8,978,140 B2
Filed: 06/20/2011
Issued: 03/10/2015
Est. Priority Date: 07/10/2006
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of categorizing a uniform resource locator (URL) based on web content associated with the URL, the method comprising:

identifying a first URL using a first collection method of a plurality of collection methods, wherein each of the plurality of collection methods is performed using at least one electronic processor;

determining, using an electronic processor, whether the first URL contains a malicious data element;

categorizing, using an electronic processor, the first URL in response to a determination that the first URL contains a malicious data element;

in response to determining the first URL does not contain a malicious data element;

assigning, using an electronic processor, a first categorization priority to the first URL based on the first URL being identified using the first collection method, andcategorizing, using an electronic processor, the first URL based on the first categorization priority, wherein categorization of a URL comprises assigning a category to the URL based on a classification of at least one of web content or an Internet Protocol (IP) address identified by the URL;

identifying a second URL using a second collection method, wherein the first collection method and the second collection method are different and each are one of a web crawler, a Domain Name Server (DNS) database, and a honey client;

determining, using an electronic processor, whether the second URL contains a malicious data element;

categorizing, using an electronic processor, the second URL in response to a determination that the second URL contains a malicious data element;

in response to determining the second URL does not contain a malicious data element;

assigning, using an electronic processor, a second categorization priority different than the first categorization priority based on the second URL having been identified using the second collection method, andcategorizing, using an electronic processor, the second URL based on the second categorization priority.

View all claims

14 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method are provided for identifying inappropriate content in websites on a network. Unrecognized uniform resource locators (URLs) or other web content are accessed by workstations and are identified as possibly having malicious content. The URLs or web content may be preprocessed within a gateway server module or some other software module to collect additional information related to the URLs. The URLs may be scanned for known attack signatures, and if any are found, they may be tagged as candidate URLs in need of further analysis by a classification module.

Citations

16 Claims

1. A computer-implemented method of categorizing a uniform resource locator (URL) based on web content associated with the URL, the method comprising:
- identifying a first URL using a first collection method of a plurality of collection methods, wherein each of the plurality of collection methods is performed using at least one electronic processor;
  
  determining, using an electronic processor, whether the first URL contains a malicious data element;
  
  categorizing, using an electronic processor, the first URL in response to a determination that the first URL contains a malicious data element;
  
  in response to determining the first URL does not contain a malicious data element;
  
  assigning, using an electronic processor, a first categorization priority to the first URL based on the first URL being identified using the first collection method, andcategorizing, using an electronic processor, the first URL based on the first categorization priority, wherein categorization of a URL comprises assigning a category to the URL based on a classification of at least one of web content or an Internet Protocol (IP) address identified by the URL;
  
  identifying a second URL using a second collection method, wherein the first collection method and the second collection method are different and each are one of a web crawler, a Domain Name Server (DNS) database, and a honey client;
  
  determining, using an electronic processor, whether the second URL contains a malicious data element;
  
  categorizing, using an electronic processor, the second URL in response to a determination that the second URL contains a malicious data element;
  
  in response to determining the second URL does not contain a malicious data element;
  
  assigning, using an electronic processor, a second categorization priority different than the first categorization priority based on the second URL having been identified using the second collection method, andcategorizing, using an electronic processor, the second URL based on the second categorization priority.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The computer-implemented method of claim 1, further comprising determining a frequency of requests for the web content associated with the first URL, and prioritizing categorization of the first URL based at least in part on the frequency of requests.
  - 3. The computer-implemented method of claim 2, wherein the time at which the category is determined is based on the frequency.
  - 4. The computer-implemented method of claim 1, further comprising:
    - identifying a third URL using a third collection method, and categorizing the third URL at a different priority than categorization of the first and second URLs based on the third URL having been identified using the third collection method, wherein the third collection method includes one of a known malicious URL received from an external organization, and an email module configured to receive URLs via email.
  - 5. The computer-implemented method of claim 1, further comprising:
    - providing the first URL to a data mining module, the data mining module in communication with a plurality of collection sources, the plurality of collection sources implementing the plurality of collection methods, and comprising asynchronous processes.
  - 6. The computer-implemented method of claim 5, further comprising configuring the data mining module, wherein configuring the data mining module includes defining a characteristic indicative of a targeted attribute, and configuring the data mining module to identify requests having the attribute.
  - 7. The computer-implemented method of claim 6, wherein the targeted attribute comprises at least one of keywords, regular expressions, or operands.
  - 8. The computer-implemented method of claim 6, wherein the attribute is a type of HTTP request header data.
  - 9. The computer-implemented method of claim 8, wherein the HTTP request header data includes a content-type.

10. A computer system for categorizing a URL, the system comprising:
- one or more hardware processors configured to;
  
  identify a first URL using a first collection method of a plurality of collection methods;
  
  determine whether the first URL contains a malicious data element;
  
  categorize the first URL in response to a determination that the first URL contains a malicious data element;
  
  in response to determining the first URL does not contain a malicious data element;
  
  assign a first categorization priority to the first URL based on the first URL being identified using the first collection method.categorize the first URL based on the first categorization priority, wherein categorization of a URL comprises assigning a category to the URL based on a classification of at least one of web content or an Internet Protocol (IP) address identified by the URL;
  
  identify a second URL using a second collection method, wherein the first collection method and the second collection method are different and each are one of a web crawler, a Domain Name Server (DNS) database, and a honey client;
  
  determine whether the second URL contains a malicious data element;
  
  categorize the second URL in response to a determination that the second URL contains a malicious data element; and
  
  in response to determining the second URL does not to contain a malicious data element;
  
  assign a second categorization priority to the second URL different than the first categorization priority based on the second URL being identified using the second collection method, andcategorize the second URL based on the second categorization priority.
- View Dependent Claims (11, 12, 13)
- - 11. The system of claim 10, wherein the one or more hardware processors are further configured to determine a frequency of requests for the web content identified by the first URL, and to prioritize categorization of the first URL based at least in part on the frequency of requests.
  - 12. The system of claim 10, wherein the one or more hardware processors are further configured to identify a third URL using a third collection method, and to prioritize categorization of the third URL at a different priority than either the categorization of the first or second URL based on the third URL having been identified using the third collection method and the first URL having been identified using the first collection method and the second URL having been identified using the second collection method.
  - 13. The system of claim 10, wherein the one or more hardware processors are further configured to categorize the first URL based on whether web content identified by the first URL includes active content.

14. A computer-implemented system for identifying URLs associated with malicious content, the system comprising:
- a hardware processor; and
  
  a memory for storing computer executable instructions that, when executed by the hardware processor, cause the hardware processor to perform the steps of;
  
  identifying a first URL using a first collection method of a plurality of collection methods;
  
  determining whether the first URL contains a malicious data element;
  
  categorizing the first URL in response to the first URL containing a malicious data element, wherein categorization of a URL comprises assigning a category to the URL based on a classification of at least one of web content or an Internet Protocol (IP) address identified by the URL;
  
  assigning a first categorization priority to the first URL based on the first URL being identified using the first collection method in response to the first URL not containing a malicious data element, andcategorizing the first URL based on the first categorization priority in response to the first URL not containing a malicious data element,identifying a second URL using a second collection method, wherein the first collection method and the second collection method are different and each are one of a web crawler, a Domain Name Server (DNS) database, and a honey client;
  
  determining whether the second URL contains a malicious data element;
  
  categorizing the second URL in response to the determination that the second URL contains a malicious data element;
  
  assigning a second categorization priority in response to the second URL not containing a malicious data element, the second categorization priority different than the first categorization priority and based on the second URL having been identified using the second collection method; and
  
  categorizing the second URL based on the second categorization priority in response to the determination that the second URL does not contain a malicious data element.
- View Dependent Claims (15)
- - 15. The computer-implemented system of claim 14, categorizing the first URL further comprising prioritizing categorization of the first URL based at least in part on a frequency of requests for the web content identified by the first URL.

16. A non-transitory computer readable storage medium comprising instructions that when executed cause a processor to perform a method of categorizing a uniform resource locator (URL) based on web content associated with the URL, the method comprising:
- identifying a first URL using a first collection method of a plurality of collection methods, wherein each of the plurality of collection methods is performed using at least one electronic processor;
  
  determining, using an electronic processor, whether the first URL contains a malicious data element;
  
  categorizing, using an electronic processor, the first URL in response to the first URL containing a malicious data element;
  
  in response to determining the first URL does not contain a malicious data element;
  
  assigning, using an electronic processor, a first categorization priority to the first URL based on the first URL being identified using the first collection method, andcategorizing, using an electronic processor, the first URL based on the first categorization priority, wherein categorization of a URL comprises assigning a category to the URL based on a classification of at least one of web content or an Internet Protocol (IP) address identified by the URL;
  
  identifying a second URL using a second collection method, wherein the first collection method and the second collection method are different and each are one of a web crawler, a Domain Name Server (DNS) database, and a honey client;
  
  determining, using an electronic processor, whether the second URL contains a malicious data element;
  
  categorizing, using an electronic processor, the second URL in response to the second URL containing a malicious data element;
  
  in response to determining the second URL does not contain a malicious data element;
  
  assigning, using an electronic processor, a second categorization priority different than the first categorization priority based on the second URL having been identified using the second collection method, andcategorizing, using an electronic processor, the second URL based on the second categorization priority.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Forcepoint LLC (Francisco Partners Management LLC)
Original Assignee
Forcepoint LLC (Francisco Partners Management LLC)
Inventors
Hubbard, Dan, Verenini, Nicholas Joseph, Baddour, Victor Louie
Primary Examiner(s)
Pham, Luu
Assistant Examiner(s)
JACKSON, JENISE E

Application Number

US13/164,688
Publication Number

US 20110252478A1
Time in Patent Office

1,359 Days
Field of Search

726/22, 726/24, 726/30
US Class Current

726/24
CPC Class Codes

G06F 16/285   Clustering or classification

G06F 16/951   Indexing; Web crawling tech...

H04L 63/1441   Countermeasures against mal...

System and method of analyzing web content

First Claim

14 Assignments

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

System and method of analyzing web content

First Claim

14 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links