Referrer context aware target queue prioritization
First Claim
1. A method of prioritizing a web crawler target link queue, comprising:
- retrieving referrer context information associated with a link to a remote object with respect to a plurality of client devices, the referrer context information identifying how the link was spread among the plurality of client devices, wherein the link is spread among the plurality of client devices via a plurality of different networking protocols and the retrieved referrer context information is different for the different networking protocols via which the link was spread;
aggregating the retrieved referrer context information;
analyzing the aggregated retrieved referrer context information for the link to the remote object to determine whether the aggregated retrieved referrer context information identifying how the link was spread among the plurality of client devices indicates a threat to the plurality of client devices; and
prioritizing the link to the remote object within the web crawler target link queue relative to other links in the web crawler target link queue based on the analyzed aggregated retrieved referrer context information, wherein a priority of the link is increased responsive to the determination that the aggregated retrieved referrer context information indicates the threat to the plurality of client devices.
5 Assignments
0 Petitions
Accused Products
Abstract
A computer, computer program product, and method prioritize a web crawler target link queue using referrer context information associated with a remote object link. An access statistics collection module detects links to remote objects and retrieves referrer context information for the links. An access statistics back end module receives and stores the referrer context information from the access statistics collection module. The referrer context information is analyzed by a target list prioritization module that uses the results of the analysis to prioritize a target queue of a web crawler. The referrer context information is an important resource in identifying information about how a link spreads, e.g., for threat detection or identification of popular links for indexing to produce more relevant search results.
-
Citations
17 Claims
-
1. A method of prioritizing a web crawler target link queue, comprising:
-
retrieving referrer context information associated with a link to a remote object with respect to a plurality of client devices, the referrer context information identifying how the link was spread among the plurality of client devices, wherein the link is spread among the plurality of client devices via a plurality of different networking protocols and the retrieved referrer context information is different for the different networking protocols via which the link was spread; aggregating the retrieved referrer context information; analyzing the aggregated retrieved referrer context information for the link to the remote object to determine whether the aggregated retrieved referrer context information identifying how the link was spread among the plurality of client devices indicates a threat to the plurality of client devices; and prioritizing the link to the remote object within the web crawler target link queue relative to other links in the web crawler target link queue based on the analyzed aggregated retrieved referrer context information, wherein a priority of the link is increased responsive to the determination that the aggregated retrieved referrer context information indicates the threat to the plurality of client devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A non-transitory computer-readable storage medium having computer program instructions embodied therein for prioritizing a web crawler target link queue, comprising:
-
an access statistics collection module configured to; retrieve referrer context information associated with a link to a remote object with respect to a plurality of client devices, the referrer context information identifying how the link was spread among the plurality of client devices, wherein the link is spread among the plurality of client devices via a plurality of different networking protocols and the retrieved referrer context information is different for the different networking protocols via which the link was spread; and a target list prioritization module configured to; aggregate the retrieved referrer context information; analyze the aggregated retrieved referrer context information for the link to the remote object to determine whether the aggregated retrieved referrer context information identifying how the link was spread among the plurality of client devices indicates a threat to the plurality of client devices; and prioritize the link to the remote object within the web crawler target link queue relative to other links in the web crawler target link queue based on the analyzed aggregated retrieved referrer context information, wherein a priority of the link is increased responsive to the determination that the aggregated retrieved referrer context information indicates the threat to the plurality of client devices. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A computer configured to prioritize a web crawler target link queue, comprising:
-
a non-transitory computer-readable storage medium having computer program instructions embodied therein comprising; an access statistics collection module configured to; retrieve referrer context information associated with a link to a remote object with respect to a plurality of client devices, the referrer context information identifying how the link was spread among the plurality of client devices, wherein the link is spread among the plurality of client devices via a plurality of different networking protocols and the retrieved referrer context information is different for the different networking protocols via which the link was spread; and a target list prioritization module configured to; aggregate the retrieved referrer context information; analyze the aggregated retrieved referrer context information for the link to the remote object to determine whether the aggregated retrieved referrer context information identifying how the link was spread among the plurality of client devices indicates a threat to the plurality of client devices; and prioritize the link to the remote object within the web crawler target link queue relative to other links in the web crawler target link queue based on the analyzed aggregated retrieved referrer context information, wherein a priority of the link is increased responsive to the determination that the aggregated retrieved referrer context information indicates the threat to the plurality of client devices; and a processor for executing the computer program instructions.
-
Specification