System and method of analyzing web content
First Claim
Patent Images
1. A computer-implemented method of categorizing a uniform resource locator (URL) based on web content associated with the URL, the method comprising:
- identifying a first URL using a first collection method of a plurality of collection methods, wherein each of the plurality of collection methods is performed using at least one electronic processor;
determining, using an electronic processor, whether the first URL contains a malicious data element;
categorizing, using an electronic processor, the first URL in response to a determination that the first URL contains a malicious data element;
in response to determining the first URL does not contain a malicious data element;
assigning, using an electronic processor, a first categorization priority to the first URL based on the first URL being identified using the first collection method, andcategorizing, using an electronic processor, the first URL based on the first categorization priority, wherein categorization of a URL comprises assigning a category to the URL based on a classification of at least one of web content or an Internet Protocol (IP) address identified by the URL;
identifying a second URL using a second collection method, wherein the first collection method and the second collection method are different and each are one of a web crawler, a Domain Name Server (DNS) database, and a honey client;
determining, using an electronic processor, whether the second URL contains a malicious data element;
categorizing, using an electronic processor, the second URL in response to a determination that the second URL contains a malicious data element;
in response to determining the second URL does not contain a malicious data element;
assigning, using an electronic processor, a second categorization priority different than the first categorization priority based on the second URL having been identified using the second collection method, andcategorizing, using an electronic processor, the second URL based on the second categorization priority.
14 Assignments
0 Petitions
Accused Products
Abstract
A system and method are provided for identifying inappropriate content in websites on a network. Unrecognized uniform resource locators (URLs) or other web content are accessed by workstations and are identified as possibly having malicious content. The URLs or web content may be preprocessed within a gateway server module or some other software module to collect additional information related to the URLs. The URLs may be scanned for known attack signatures, and if any are found, they may be tagged as candidate URLs in need of further analysis by a classification module.
-
Citations
16 Claims
-
1. A computer-implemented method of categorizing a uniform resource locator (URL) based on web content associated with the URL, the method comprising:
-
identifying a first URL using a first collection method of a plurality of collection methods, wherein each of the plurality of collection methods is performed using at least one electronic processor; determining, using an electronic processor, whether the first URL contains a malicious data element; categorizing, using an electronic processor, the first URL in response to a determination that the first URL contains a malicious data element; in response to determining the first URL does not contain a malicious data element; assigning, using an electronic processor, a first categorization priority to the first URL based on the first URL being identified using the first collection method, and categorizing, using an electronic processor, the first URL based on the first categorization priority, wherein categorization of a URL comprises assigning a category to the URL based on a classification of at least one of web content or an Internet Protocol (IP) address identified by the URL; identifying a second URL using a second collection method, wherein the first collection method and the second collection method are different and each are one of a web crawler, a Domain Name Server (DNS) database, and a honey client; determining, using an electronic processor, whether the second URL contains a malicious data element; categorizing, using an electronic processor, the second URL in response to a determination that the second URL contains a malicious data element; in response to determining the second URL does not contain a malicious data element; assigning, using an electronic processor, a second categorization priority different than the first categorization priority based on the second URL having been identified using the second collection method, and categorizing, using an electronic processor, the second URL based on the second categorization priority. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer system for categorizing a URL, the system comprising:
-
one or more hardware processors configured to; identify a first URL using a first collection method of a plurality of collection methods; determine whether the first URL contains a malicious data element; categorize the first URL in response to a determination that the first URL contains a malicious data element; in response to determining the first URL does not contain a malicious data element; assign a first categorization priority to the first URL based on the first URL being identified using the first collection method. categorize the first URL based on the first categorization priority, wherein categorization of a URL comprises assigning a category to the URL based on a classification of at least one of web content or an Internet Protocol (IP) address identified by the URL; identify a second URL using a second collection method, wherein the first collection method and the second collection method are different and each are one of a web crawler, a Domain Name Server (DNS) database, and a honey client; determine whether the second URL contains a malicious data element; categorize the second URL in response to a determination that the second URL contains a malicious data element; and in response to determining the second URL does not to contain a malicious data element; assign a second categorization priority to the second URL different than the first categorization priority based on the second URL being identified using the second collection method, and categorize the second URL based on the second categorization priority. - View Dependent Claims (11, 12, 13)
-
-
14. A computer-implemented system for identifying URLs associated with malicious content, the system comprising:
-
a hardware processor; and a memory for storing computer executable instructions that, when executed by the hardware processor, cause the hardware processor to perform the steps of; identifying a first URL using a first collection method of a plurality of collection methods; determining whether the first URL contains a malicious data element; categorizing the first URL in response to the first URL containing a malicious data element, wherein categorization of a URL comprises assigning a category to the URL based on a classification of at least one of web content or an Internet Protocol (IP) address identified by the URL; assigning a first categorization priority to the first URL based on the first URL being identified using the first collection method in response to the first URL not containing a malicious data element, and categorizing the first URL based on the first categorization priority in response to the first URL not containing a malicious data element, identifying a second URL using a second collection method, wherein the first collection method and the second collection method are different and each are one of a web crawler, a Domain Name Server (DNS) database, and a honey client; determining whether the second URL contains a malicious data element; categorizing the second URL in response to the determination that the second URL contains a malicious data element; assigning a second categorization priority in response to the second URL not containing a malicious data element, the second categorization priority different than the first categorization priority and based on the second URL having been identified using the second collection method; and categorizing the second URL based on the second categorization priority in response to the determination that the second URL does not contain a malicious data element. - View Dependent Claims (15)
-
-
16. A non-transitory computer readable storage medium comprising instructions that when executed cause a processor to perform a method of categorizing a uniform resource locator (URL) based on web content associated with the URL, the method comprising:
-
identifying a first URL using a first collection method of a plurality of collection methods, wherein each of the plurality of collection methods is performed using at least one electronic processor; determining, using an electronic processor, whether the first URL contains a malicious data element; categorizing, using an electronic processor, the first URL in response to the first URL containing a malicious data element; in response to determining the first URL does not contain a malicious data element; assigning, using an electronic processor, a first categorization priority to the first URL based on the first URL being identified using the first collection method, and categorizing, using an electronic processor, the first URL based on the first categorization priority, wherein categorization of a URL comprises assigning a category to the URL based on a classification of at least one of web content or an Internet Protocol (IP) address identified by the URL; identifying a second URL using a second collection method, wherein the first collection method and the second collection method are different and each are one of a web crawler, a Domain Name Server (DNS) database, and a honey client; determining, using an electronic processor, whether the second URL contains a malicious data element; categorizing, using an electronic processor, the second URL in response to the second URL containing a malicious data element; in response to determining the second URL does not contain a malicious data element; assigning, using an electronic processor, a second categorization priority different than the first categorization priority based on the second URL having been identified using the second collection method, and categorizing, using an electronic processor, the second URL based on the second categorization priority.
-
Specification