×

Minimizing visibility of stale content in web searching including revising web crawl intervals of documents

  • US 8,407,204 B2
  • Filed: 06/22/2011
  • Issued: 03/26/2013
  • Est. Priority Date: 08/30/2004
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for scheduling documents to be crawled by a search engine in an appropriate order to reduce visibility of stale content in web searching, comprising:

  • on a computer system having one or more processors and memory storing one or more programs for execution by the one or more processors;

    associating with each of a plurality of documents a respective initial web crawl interval;

    partitioning the plurality of documents into a plurality of tiers according to their respective web crawl intervals, each tier in the plurality of tiers having a distinct associated range of web crawl intervals, including storing data for each tier identifying documents in the plurality of documents assigned to that tier in accordance with the documents'"'"' respective web crawl intervals;

    associating a revised web crawl interval with a respective document of the plurality of documents, including updating the web crawl interval of the respective document to be less than the initial web crawl interval when the respective document'"'"'s content has changed, and updating the web crawl interval to be more than the initial web crawl interval when the respective document'"'"'s content has not changed; and

    moving the respective document between tiers of the plurality of tiers when the respective revised web crawl interval of the respective document is associated with a different tier of the plurality of tiers than a previous web crawl interval of the respective document.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×