Using exceptional changes in webgraph snapshots over time for internet entity marking
First Claim
1. A method comprising:
- measuring, at a first time, for each host of a plurality of hosts, a number of pages hosted on that host;
measuring, at a second time that differs from the first time, for each host of the plurality of hosts, a number of pages hosted on that host;
determining, for each host of the plurality of hosts, a rate of growth in a number of pages hosted on that host between the first time and the second time; and
identifying selected hosts, in the plurality of hosts, which are associated with rates of growth that exceed a specified threshold;
wherein the method is performed by at least one device comprising a processor.
9 Assignments
0 Petitions
Accused Products
Abstract
Techniques are provided through which “suspicious” web pages may be identified automatically. A “suspicious” web page possesses characteristics that indicate some manipulation to artificially inflate the position of the web page within ranked search results. Web pages may be represented as nodes within a graph. Links between web pages may be represented as directed edges between the nodes. “Snapshots” of the current state of a network of interlinked web pages may be automatically generated at different times. In the time interval between snapshots, the state of the network may change. By comparing an earlier snapshot to a later snapshot, such changes can be identified. Extreme changes, which are deemed to vary significantly from the normal range of expected changes, can be detected automatically. Web pages relative to which these extreme changes have occurred may be marked as suspicious web pages which may merit further investigation or action.
40 Citations
22 Claims
-
1. A method comprising:
-
measuring, at a first time, for each host of a plurality of hosts, a number of pages hosted on that host; measuring, at a second time that differs from the first time, for each host of the plurality of hosts, a number of pages hosted on that host; determining, for each host of the plurality of hosts, a rate of growth in a number of pages hosted on that host between the first time and the second time; and identifying selected hosts, in the plurality of hosts, which are associated with rates of growth that exceed a specified threshold; wherein the method is performed by at least one device comprising a processor. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method comprising:
-
measuring, at a first time, for each domain of a plurality of domains, a number of hosts contained within that domain; measuring, at a second time that differs from the first time, for each domain of the plurality of domains, a number of hosts contained within that domain; determining, for each domain of the plurality of domains, a rate of growth in a number of hosts contained within that domain between the first time and the second time; and identifying selected domains, in the plurality of domains, which are associated with rates of growth that exceed a specified threshold; wherein the method is performed by at least one device comprising a processor. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. A non-transitory computer-readable storage medium storing instructions, which when executed by one or more processors, perform steps comprising:
-
measuring, at a first time, for each host of a plurality of hosts, a number of pages hosted on that host; measuring, at a second time that differs from the first time, for each host of the plurality of hosts, a number of pages hosted on that host; determining, for each host of the plurality of hosts, a rate of growth in a number of pages hosted on that host between the first time and the second time; identifying selected hosts, in the plurality of hosts, which are associated with rates of growth that exceed a specified threshold. - View Dependent Claims (14, 15, 16, 17)
-
-
13. A non-transitory computer-readable storage medium storing instructions, which when executed by one or more processors, perform steps comprising:
-
measuring, at a first time, for each domain of a plurality of domains, a number of hosts contained within that domain; measuring, at a second time that differs from the first time, for each domain of the plurality of domains, a number of hosts contained within that domain; determining, for each domain of the plurality of domains, a rate of growth in a number of hosts contained within that domain between the first time and the second time; and identifying selected domains, in the plurality of domains, which are associated with rates of growth that exceed a specified threshold. - View Dependent Claims (18, 19, 20, 21, 22)
-
Specification