Managing items in crawl schedule
First Claim
Patent Images
1. A method of determining a schedule for recrawling items including:
- estimating, using a processor of one or more devices, a change period for each of a set of items to be crawled;
generating, using a processor of the one or more devices, a crawl list of items where each item on the crawl list is overdue to be crawled in accordance with the change period for the item and with when the item was last crawled;
selecting, using a processor of the one or more devices, a sorting method from multiple different sorting methods based on one or more different factors; and
sorting, using a processor of the one or more devices, the crawl list prior to crawling the items on the crawl list using the selected sorting method.
2 Assignments
0 Petitions
Accused Products
Abstract
Determining a schedule for recrawling pages is disclosed. A crawling schedule that specifies a due date at which each page is to be crawled is determined according to a first scheme. A set of pages that includes one or more pages each of which has a due date that has passed is determined. The set of pages is ordered according to a second scheme.
52 Citations
20 Claims
-
1. A method of determining a schedule for recrawling items including:
-
estimating, using a processor of one or more devices, a change period for each of a set of items to be crawled; generating, using a processor of the one or more devices, a crawl list of items where each item on the crawl list is overdue to be crawled in accordance with the change period for the item and with when the item was last crawled; selecting, using a processor of the one or more devices, a sorting method from multiple different sorting methods based on one or more different factors; and sorting, using a processor of the one or more devices, the crawl list prior to crawling the items on the crawl list using the selected sorting method. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for crawling items in a network comprising:
one or more processors configured to; estimate a change period for each of a set of items to be crawled; generate a crawl list of items where each item on the crawl list is overdue to be crawled in accordance with the change period for the item and with when the item was last crawled; select a sorting method from multiple different sorting methods based on one or more different factors; and sort the crawl list prior to crawling the items on the crawl list using the selected sorting method. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A search appliance comprising:
-
a crawl scheduler for estimating a change period for each of a set of items to be crawled; and a crawl manager for generating a crawl list of items where each item on the crawl list is overdue to be crawled in accordance with the change period for the item and with when the item was last crawled, for selecting a sorting method from multiple different sorting methods based on one or more different factors and for sorting the crawl list prior to crawling the items on the crawl list using the selected sorting method. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification