INCREMENTAL CRAWLING OF MULTIPLE CONTENT PROVIDERS USING AGGREGATION
First Claim
1. A method for incremental crawling of content stored on a plurality of content providers using aggregation, the method comprising:
- receiving a request to crawl content on one or more associated content providers;
retrieving one or more first references to content on a first content provider;
retrieving one or more second references to content on one or more second content providers during the same request;
aggregating the first and second references; and
returning the aggregated first and second references.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for incremental crawling of content stored on a plurality of content providers using aggregation is provided. The method comprises receiving a request to crawl content on one or more associated content providers; retrieving one or more first references to content on a first content provider; retrieving one or more second references to content on one or more second content providers during the same request; aggregating the first and second references; and returning the aggregated first and second references. This is done while taking into consideration opaque timestamp object which is managed in a distributed manner. The opaque timestamp is filled in by the content providers but stored in the crawler side between crawling sessions.
-
Citations
20 Claims
-
1. A method for incremental crawling of content stored on a plurality of content providers using aggregation, the method comprising:
-
receiving a request to crawl content on one or more associated content providers; retrieving one or more first references to content on a first content provider; retrieving one or more second references to content on one or more second content providers during the same request; aggregating the first and second references; and returning the aggregated first and second references. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 12)
-
-
11. A method for incremental crawling of content stored on a plurality of content providers using aggregation, the method comprising:
-
receiving a request to crawl content on one or more associated content providers, wherein the request comprises at least a starting index value and a range value; forwarding the request to a first content provider on a list, in response to determining that there is no valid state information; forwarding the request to a first content provider identified by the state information as a next content provider, in response to determining that there is valid state information; receiving references, state information, or timing information from the first content provider; aggregating the received references, state information, and timing information with other references, state information, and timing information, respectively; forwarding the request to a second content provider on the list, in response to determining that the request has not been satisfied to the maximum extent possible; updating the state information with the next content provider and corresponding next starting index, in response to determining that the request has been satisfied to the maximum extent possible; and returning the aggregated references, the updated state information, and the aggregated timestamp, if available. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification