×

System and method of accessing a document efficiently through multi-tier web caching

  • US 7,437,364 B1
  • Filed: 06/30/2004
  • Issued: 10/14/2008
  • Est. Priority Date: 06/30/2004
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for accessing a document, comprising:

  • identifying a set of documents from a database, the set of documents stored in a search engine repository containing documents obtained by a network crawler system, the database including information about each document in the set of documents;

    identifying documents in the set of documents that satisfy predefined criteria, the predefined criteria including a requirement that document content in the search engine repository for the identified documents has remained unchanged over multiple downloads by the network crawler system; and

    inserting, for each respective identified document, a respective entry in a cache index indicating that the document content of the respective identified document is suitable for retrieval from the search engine repository and for serving to respective clients, wherein each respective identified document has a respective URL and wherein the search engine repository is at a location that is independent of the respective URLs of the identified documents;

    receiving at a document server a request from a client, the request identifying a URL of a document;

    identifying at the document server a first document copy corresponding to the identified URL;

    determining whether the first document copy is stale;

    when the first document copy is determined not to be stale, serving to the client the first document copy from the document server;

    when the first document copy at the document server is determined to be stale, and a first condition is satisfied, the first condition including a requirement that the cache index include an entry for the identified URL that indicates that a repository copy of the document in the search engine repository is suitable for retrieval from the search engine repository and for serving to respective clients,retrieving the repository copy of the document from the search engine repository; and

    when the first document copy at the document server is determined to be stale, and the first condition is not satisfied,retrieving a host copy of the document.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×