×

Document reuse in a search engine crawler

  • US 8,707,312 B1
  • Filed: 06/30/2004
  • Issued: 04/22/2014
  • Est. Priority Date: 07/03/2003
  • Status: Active Grant
First Claim
Patent Images

1. A method of scheduling a document for crawling by a search engine crawler, comprising:

  • at a server system having one or more processors and memory storing one or more programs executed by the one or more processors;

    retrieving a plurality of records corresponding to prior scheduled crawls of the document, each record comprising a set of entries, the set of entries comprising a document identifier, a crawl time, and a crawl type;

    setting a reuse flag for the document based, at least in part, on information in the retrieved plurality of records;

    outputting the reuse flag to a scheduler record, the scheduler record comprising a document identifier and the reuse flag; and

    when performing a document crawling operation, and the reuse flag for the document has a predefined reuse value, reusing a previously downloaded version of the document instead of downloading a current version of the document from a host server, wherein reusing the previously downloaded version of the document includes;

    at the server system, while indexing documents obtained from the document crawling operation to build a search index, indexing the document having the reuse flag with the predefined reuse value by indexing the previously downloaded version of the document, wherein the search index includes a plurality of potential search terms and documents associated with potential search terms.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×