×

Minimizing visibility of stale content in web searching including revising web crawl intervals of documents

  • US 8,782,032 B2
  • Filed: 03/22/2013
  • Issued: 07/15/2014
  • Est. Priority Date: 08/30/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for scheduling a document crawl interval, comprising:

  • at a computer system having one or more processors and a memory storing one or more programs for execution by the one or more processors;

    comparing a first instance of a document in a plurality of documents with a second instance of the document, thereby obtaining a document comparison, wherein the first instance of the document is obtained from a remote location at a specified time before the second instance of the document is obtained from the remote location and wherein(i) the specified time is determined in accordance with a first crawl interval associated with the document,(ii) each document in the plurality of documents is assigned to a crawl-scheduling tier in a plurality of crawl-scheduling tiers, each crawl-scheduling tier in the plurality of crawl-scheduling tiers having a distinct associated range of web crawl intervals, and(iii) the first crawl interval is assigned a first crawl-scheduling tier in the plurality of crawl-scheduling tiers; and

    computing a second crawl interval for the document, wherein the second crawl interval is a function of the document comparison; and

    determining whether the second crawl interval is in the crawl-scheduling first tier, wherein, when the second crawl interval is not in the crawl-scheduling first tier, the first document is reassigned to a crawl-scheduling tier in the plurality of crawl-scheduling tiers other than the first crawl-scheduling tier.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×