Managing URLs
First Claim
Patent Images
1. A method of crawling pages including:
- crawling pages up to a target number of pages, at least a subset of which are not constrained to have an importance;
crawling additional pages beyond the target number of pages, wherein the additional pages are constrained to have an importance that is equal to or greater than an importance threshold; and
providing as output for each of at least a subset of the crawled pages and additional pages a crawl data associated with the respective page;
wherein an importance is a query independent metric associated with the page.
2 Assignments
0 Petitions
Accused Products
Abstract
Crawling pages is disclosed. Pages are crawled up to a target number of pages. Additional pages, that have an importance that is equal to or greater than an importance threshold, are crawled beyond the target number of pages. In some embodiments, pages having an importance less than an importance threshold are deleted.
-
Citations
27 Claims
-
1. A method of crawling pages including:
-
crawling pages up to a target number of pages, at least a subset of which are not constrained to have an importance; crawling additional pages beyond the target number of pages, wherein the additional pages are constrained to have an importance that is equal to or greater than an importance threshold; and providing as output for each of at least a subset of the crawled pages and additional pages a crawl data associated with the respective page; wherein an importance is a query independent metric associated with the page. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for crawling pages comprising:
-
a processor, coupled to a memory, configured to; crawl pages up to a target number of pages, at least a subset of which are not constrained to have an importance; crawl additional pages beyond the target number of pages, wherein the additional pages are constrained to have an importance that is greater than an importance threshold; and provide as output for each of at least a subset of the crawled pages and additional pages a crawl data associated with the respective page; wherein an importance is a query independent metric associated with the page; and a memory coupled to the processor, wherein the memory provides the processor with instructions. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A computer program product for crawling pages, the computer program product being embodied in a computer readable medium and comprising computer instructions for:
-
crawling pages up to a target number of pages, at least a subset of which are not constrained to have an importance; crawling additional pages beyond the target number of pages, wherein the additional pages are constrained to have an importance that is greater than an importance threshold; and providing as output for each of at least a subset of the crawled pages and additional pages a crawl data associated with the respective page; wherein an importance is a query independent metric associated with the page. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
Specification