Method and system for extending the performance of a web crawler
First Claim
1. A method comprising the steps of:
- receiving data from a web site located across a network;
determining whether additional data from the web site is extractable;
in response to determining that additional data from the web site is extractable, creating at least one synthetic hyperlink for extracting the data from the web site;
combining the at least one synthetic hyperlink with the data received from the website to create combined data; and
sending the combined data to a crawler.
1 Assignment
0 Petitions
Accused Products
Abstract
A proxy engine (108), in communication with a web crawler (100), extends the performance of the web crawler (100) by modifying hyperlink requests and creating synthetic hyperlink requests from data received from a web site (104). The proxy engine (108) converts hyperlinks to a method used by a target web site. The proxy engine receives data from a web site (104) located across a network (102), and then determines whether additional data from the web site (104) is extractable. In response to determining that additional data from the web site (104) is extractable, the proxy engine (108) creates at least one synthetic hyperlink for extracting the data from the web site (104). The proxy engine (108) then combines the at least one synthetic hyperlink with the data received from the website (104) to create combined data and then sends the combined data to the crawler (100).
180 Citations
20 Claims
-
1. A method comprising the steps of:
-
receiving data from a web site located across a network;
determining whether additional data from the web site is extractable;
in response to determining that additional data from the web site is extractable, creating at least one synthetic hyperlink for extracting the data from the web site;
combining the at least one synthetic hyperlink with the data received from the website to create combined data; and
sending the combined data to a crawler. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method comprising the steps of:
-
receiving a synthetic hyperlink request;
converting the synthetic hyperlink request to a method indicated by the synthetic hyperlink request to create a converted hyperlink request; and
sending the converted hyperlink request to a web site. - View Dependent Claims (7)
-
-
8. A system comprising:
-
at least one communication device for receiving data from a web site and sending data to a crawler; and
at least one computer processing device, communicatively coupled to the at least one communication device, for analyzing data received from the web site to determine whether additional data is extractable from the web site by a hyperlink request, creating at least one synthetic hyperlink for extracting data from the web site, and combining the at least one synthetic hyperlink with data received from the web site in response to the hyperlink request. - View Dependent Claims (9, 10, 11, 12, 14, 16, 17, 18)
-
-
13. A system for extending the performance of a crawler comprising:
-
at least one communication device for receiving a synthetic hyperlink request and sending data to a web site across a network, in accordance with the synthetic hyperlink request; and
a method converter, communicatively coupled to the at least one communication device, for converting the synthetic hyperlink request to a method compatible with the web site.
-
-
15. A computer readable medium including computer instructions for a computing system, the computer instructions comprising instructions for:
-
receiving data from a web site located across a network;
determining whether additional data from the web site is extractable;
in response to determining that additional data from the web site is extractable, creating at least one synthetic hyperlink for extracting the data from the web site;
combining the at least one synthetic hyperlink with the data received from the web site to create combined data; and
sending the combined data to a crawler.
-
-
19. A computer readable medium including computer instruction for a computer system, the computer instructions comprising instructions for:
-
receiving a synthetic hyperlink request from a crawler;
converting the synthetic hyperlink request to a method indicated by the synthetic hyperlink request to create a converted hyperlink request; and
sending the converted hyperlink request to a web site.
-
-
20. A computer readable medium including computer instruction for a computer system, the computer instructions comprising instructions for:
-
receiving a synthetic hyperlink request for extracting data from a web site;
converting the synthetic hyperlink request to a method indicated by the synthetic hyperlink request; and
in response to determining that the synthetic hyperlink request indicates the use of a POST method, converting the synthetic hyperlink request from a GET method to a POST method.
-
Specification