Detecting a network crawler
First Claim
1. A computer-implemented method, comprising:
- receiving, by a computer system associated with an electronic marketplace, a request for a first web page of a web site of the electronic marketplace, the request received from a computing device;
inserting, by the computer system in the first web page, a universal resource locator (URL) of a second web page of the web site and code, the second web page inaccessible to web crawlers based at least in part on a robots exclusion protocol of the web site identifying the second web page, the URL inserted in markup language of the first web page, the code comprising statements of a programmatic scripting language in accordance with an ECMAScript standard and configured to, upon execution of the code at the computing device;
determine a presence or absence of the URL in a browser history stored at the computing device and, if the URL is present in the browser history, one or more of;
a state or a style attribute of the URL from the browser history, anddetermine, based at least in part on the presence or absence of the URL, whether the second web page was accessed by the computing device;
providing, by the computer system, the first web page to the computing device based at least in part on the request;
receiving, by the computer system from the computing device, an indication that the second web page was not accessed prior to providing the first web page to the computing device, the indication received based at least in part on a determination that the URL is absent from the browser history or on a determination of the state or style attribute of the URL if the URL is present in the browser history, the determination based at least in part on an execution of the code at the computing device; and
determining, based at least in part on the indication, that the request for the first web page is associated with a web crawler hosted on the computing device.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for detecting a network crawler may be described. In particular, a request for information may be received from a computing system. Based on this request, a network-based document may be provided to the computing system. The network-based document may include a portion of the information, code, and an identifier of another network-based document. The code may be configured to, upon execution, determine whether the other network-based document was accessed prior to providing the network-based document to the computing system. An indication whether the other network-based document was accessed may be received from the computing system. The indication may be received based on an execution of the code at the computing system. Based on the indication, the request for the information may be determined as being associated with a network crawler hosted on the computing system.
52 Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
receiving, by a computer system associated with an electronic marketplace, a request for a first web page of a web site of the electronic marketplace, the request received from a computing device; inserting, by the computer system in the first web page, a universal resource locator (URL) of a second web page of the web site and code, the second web page inaccessible to web crawlers based at least in part on a robots exclusion protocol of the web site identifying the second web page, the URL inserted in markup language of the first web page, the code comprising statements of a programmatic scripting language in accordance with an ECMAScript standard and configured to, upon execution of the code at the computing device; determine a presence or absence of the URL in a browser history stored at the computing device and, if the URL is present in the browser history, one or more of;
a state or a style attribute of the URL from the browser history, anddetermine, based at least in part on the presence or absence of the URL, whether the second web page was accessed by the computing device; providing, by the computer system, the first web page to the computing device based at least in part on the request; receiving, by the computer system from the computing device, an indication that the second web page was not accessed prior to providing the first web page to the computing device, the indication received based at least in part on a determination that the URL is absent from the browser history or on a determination of the state or style attribute of the URL if the URL is present in the browser history, the determination based at least in part on an execution of the code at the computing device; and determining, based at least in part on the indication, that the request for the first web page is associated with a web crawler hosted on the computing device. - View Dependent Claims (2, 3, 4)
-
-
5. One or more non-transitory computer-readable media comprising instructions that, when executed with one or more processors, cause a system to at least:
-
receive, from a computing system, a request for information; provide, to the computing system, a first network-based document of a network-based resource, the first network-based document comprising a portion of the information, code, and an identifier of a second network-based document of the network-based resource, the code comprising statements of a programmatic scripting language in accordance with an ECMAScript standard and configured to, upon execution; determine a presence or an absence of the identifier in a history stored at the computing system and, if the identifier is present in the history, one or more of;
a state or a style attribute of the identifier from the history, anddetermine whether the second network-based document was accessed prior to providing the first network-based document to the computing system; determine an indication whether the second network-based document was accessed, the indication determined, upon an execution of the code at the computing system, based at least in part on a determination of the presence or the absence of the identifier in the history or on a determination of the state or style attribute of the identifier if the identifier is present in the history; and determine, based at least in part on the indication, that the request for the information is associated with a network crawler hosted on the computing system. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
-
13. A system comprising:
-
one or more processors; one or more computer-readable media comprising instructions that, when executed with the one or more processors, cause the system to at least; receive, from a computing system, a request for information; add, to a first network-based document comprising the information, code, and an identifier of a second network-based document, the first network-document and the second network-document associated with a same network-based resource of a provider, the code comprising statements of a programmatic scripting language in accordance with an ECMAScript standard and configured to, upon execution; determine a presence or an absence of the identifier in a history stored at the computing system and, if the identifier is present in the history, one or more of;
a state or a style attribute of the identifier from the history, anddetermine whether the second network-based document was accessed prior to providing the first network-based document to the computing system; provide the first network-based document to the computing system based at least in part on the request; receive, from the computing system, an indication whether the second network-based document was accessed, the indication received based at least in part on a determination, upon an execution of the code at the computing system, of the presence or the absence of the identifier in the history or on a determination of the state or style attribute of the identifier if the identifier is present in the browser history; and determine, based at least in part on the indication, that the request for the information is associated with a network crawler hosted on the computing system. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification