Methods and systems for analyzing data related to possible online fraud
First Claim
1. A method, comprising:
- periodically collecting, with a computer from a plurality of different sources, a set of data related to a web site, wherein the set of data comprises a web page on the web site;
accessing, with the computer, the set of data related to the web site;
dividing, with the computer, the set of data into a plurality of components, wherein the plurality of components comprises an Internet Protocol (“
IP”
) address associated with the web site and a body field comprising text;
analyzing at least two of the components, wherein analyzing the at least two of the plurality of components comprises;
analyzing the text of the body field to identify suspect text;
updating, with the computer, a database comprising suspect text to include at least a portion of the suspect text of the body field;
identifying a domain identified by a uniform resource locator (“
URL”
) of the web site;
identifying an Internet Protocol (“
IP”
) block assigned to the domain; and
comparing the IP address of the web site with the IP block assigned to the domain;
assigning at least one score to one or more of the analyzed components; and
categorizing the web site as a possibly fraudulent web site, based at least in part on the at least one score.
7 Assignments
0 Petitions
Accused Products
Abstract
Various embodiments of the invention provide methods, systems and software for analyzing data. In particular embodiments, for example, a set of data about a web site may be analyzed to determine whether the web site is likely to be illegitimate (e.g., to be involved in a fraudulent scheme, such as a phishing scheme, the sale of gray market goods, etc.). In an exemplary embodiment, a set of data may be divided into a plurality of components (each of which, in some cases, may be considered a separate data set). Merely by way of example, a set of data may comprise data gathered from a plurality of data sources, and/or each component may comprise data gathered from one of the plurality of data sources. As another example, a set of data may comprise a document with a plurality of sections, and each component may comprise one of the plurality of sections. Those skilled in the art will appreciate that the analysis of a particular component may comprise certain tests and/or evaluations, and that the analysis of another component may comprise different tests and/or evaluations. In other cases, the analysis of each component may comprise similar tests and/or evaluations. The variety of tests and/or evaluations generally will be implementation specific.
208 Citations
31 Claims
-
1. A method, comprising:
-
periodically collecting, with a computer from a plurality of different sources, a set of data related to a web site, wherein the set of data comprises a web page on the web site; accessing, with the computer, the set of data related to the web site; dividing, with the computer, the set of data into a plurality of components, wherein the plurality of components comprises an Internet Protocol (“
IP”
) address associated with the web site and a body field comprising text;analyzing at least two of the components, wherein analyzing the at least two of the plurality of components comprises; analyzing the text of the body field to identify suspect text; updating, with the computer, a database comprising suspect text to include at least a portion of the suspect text of the body field; identifying a domain identified by a uniform resource locator (“
URL”
) of the web site;identifying an Internet Protocol (“
IP”
) block assigned to the domain; andcomparing the IP address of the web site with the IP block assigned to the domain; assigning at least one score to one or more of the analyzed components; and categorizing the web site as a possibly fraudulent web site, based at least in part on the at least one score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method, comprising:
-
periodically collecting, with a computer from a plurality of different sources, a set of data related to a web site, wherein the set of data comprises a web page on the web site; and performing, with the computer, a plurality of tests on the web site, wherein the plurality of tests comprises; accessing the set of data, the set of data comprising data about a domain associated with the web site, including an Internet Protocol (“
IP”
) address associated with the web site and a body field comprising text;identifying a domain identified by a uniform resource locator (“
URL”
) of the web site;identifying an Internet Protocol (“
IP”
) block assigned to the domain;comparing the IP address of the web site with the IP block assigned to the domain; identifying suspect text within the body field; updating a database comprising suspect text to include at least a portion of the suspect text of the body field; assigning a score based on each of the plurality of tests; assigning a composite score to the web site based on the scores for each of the plurality of tests; and categorizing the web site based at least in part on the composite score. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A computer system, comprising a hardware processor and a set of instructions executable by the hardware processor, the set of instructions comprising:
-
instructions for periodically collecting from a plurality of different sources a set of data related to a web site, wherein the set of data comprises a web page on the web site; instructions for accessing the set of data related to the web site; instructions for dividing the set of data into a plurality of components, the plurality of components comprises an Internet Protocol (“
IP”
) address associated with the web site and a body field comprising text;instructions for analyzing at least two of the plurality of components, comprising; instructions for analyzing the text of the body field to identify suspect text; instructions for updating a database comprising suspect text to include at least a portion of the suspect text of the body field; instructions for identifying a domain identified by a uniform resource locator (“
URL”
) of the web site;instructions for identifying an Internet Protocol (“
IP”
) block assigned to the domain; andinstructions for comparing the IP address of the web site with the IP block assigned to the domain; and instructions for assigning at least one score to one or more of the analyzed components; and instructions for categorizing the web site as a possibly fraudulent web site, based at least in part on the at least one score.
-
-
29. A computer system, comprising a hardware processor and a set of instructions executable by the hardware processor, the set of instructions comprising:
-
instructions for periodically collecting from a plurality of different sources a set of data related to a web site, wherein the set of data comprises a web page on the web site; and instructions for performing a plurality of tests on the web site, wherein the instructions for performing the plurality of tests comprises; instructions for accessing the set of data, the set of data comprising data about a domain associated with the web site, including an Internet Protocol (“
IP”
) address associated with the web site and a body field comprising text;instructions for identifying a domain identified by a uniform resource locator (“
URL”
) of the web site;instructions for identifying an Internet Protocol (“
IP”
) block assigned to the domain; andinstructions for comparing the IP address of the web site with the IP block assigned to the domain; instructions for identifying suspect text within the body field; instructions for updating a database comprising suspect text to include at least a portion of the suspect text of the body field; instructions for assigning a score based on each of the plurality of tests; instructions for assigning a composite score to the web site based on the scores for each of the plurality of tests; and instructions for categorizing the web site based at least in part on the composite score.
-
-
30. A software program embodied on a non-transitory computer readable medium, the software program comprising a set of instructions executable by one or more computers, the set of instructions comprising:
-
instructions for periodically collecting from a plurality of different sources a set of data related to a web site, wherein the set of data comprises a web page on the web site; instructions for accessing the set of data related to the web site; instructions for dividing the set of data into a plurality of components, wherein the plurality of components comprises an Internet Protocol (“
IP”
) address associated with the web site and a body field comprising text;instructions for analyzing the text of the body field to identify suspect text; instructions for updating a database comprising suspect text to include at least a portion of the suspect text of the body field; instructions for identifying a domain identified by a uniform resource locator (“
URL”
) of the web site;instructions for identifying an Internet Protocol (“
IP”
) block assigned to the domain; andinstructions for comparing the IP address of the web site with the IP block assigned to the domain; instructions for assigning scores to at least some of the analyzed components; and
instructions for categorizing the web site as a possibly fraudulent web site, based at least in part on one or more of the scores.
-
-
31. A software program embodied on a non-transitory computer readable medium, the software program comprising a set of instructions executable by one or more computers, the set of instructions comprising:
-
instructions for periodically collecting from a plurality of different sources a set of data related to a web site, wherein the set of data comprises a web page on the web site; instructions for performing a plurality of tests on the web site, wherein the instructions for performing the plurality of tests comprises; instructions for accessing the set of data, the set of data comprising data about a domain associated with the web site, including an Internet Protocol (“
IP”
) address associated with the web site and a body field comprising text;instructions for identifying a domain identified by a uniform resource locator (“
URL”
) of the web site;instructions for identifying an Internet Protocol (“
IP”
) block assigned to the domain; andinstructions for comparing the IP address of the web site with the IP block assigned to the domain; instructions for identifying suspect text within the body field; instructions for updating a database comprising suspect text to include at least a portion of the suspect text of the body field; instructions for assigning a score based on each of the plurality of tests; instructions for assigning a composite score to the web site based on the scores for each of the plurality of tests; and instructions for categorizing the web site based at least in part on the composite score.
-
Specification