System and method of accessing a document efficiently through multi-tier web caching

US 7,437,364 B1
Filed: 06/30/2004
Issued: 10/14/2008
Est. Priority Date: 06/30/2004
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for accessing a document, comprising:

identifying a set of documents from a database, the set of documents stored in a search engine repository containing documents obtained by a network crawler system, the database including information about each document in the set of documents;

identifying documents in the set of documents that satisfy predefined criteria, the predefined criteria including a requirement that document content in the search engine repository for the identified documents has remained unchanged over multiple downloads by the network crawler system; and

inserting, for each respective identified document, a respective entry in a cache index indicating that the document content of the respective identified document is suitable for retrieval from the search engine repository and for serving to respective clients, wherein each respective identified document has a respective URL and wherein the search engine repository is at a location that is independent of the respective URLs of the identified documents;

receiving at a document server a request from a client, the request identifying a URL of a document;

identifying at the document server a first document copy corresponding to the identified URL;

determining whether the first document copy is stale;

when the first document copy is determined not to be stale, serving to the client the first document copy from the document server;

when the first document copy at the document server is determined to be stale, and a first condition is satisfied, the first condition including a requirement that the cache index include an entry for the identified URL that indicates that a repository copy of the document in the search engine repository is suitable for retrieval from the search engine repository and for serving to respective clients,retrieving the repository copy of the document from the search engine repository; and

when the first document copy at the document server is determined to be stale, and the first condition is not satisfied,retrieving a host copy of the document.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Upon receipt of a document request, a client assistant examines its cache for the document. If not successful, a server searches for the requested document in its cache. If the server copy is still not fresh or not found, the server seeks the document from its host. If the host cannot provide the copy, the server seeks it from a document repository. Certain documents are identified from the document repository as being fresh or stable. Information about each these identified documents is transmitted to the server which inserts entries into an index if the index does not already contain an entry for the document. If and when this particular document is requested, the document will not be present in the server, however the server will contain an entry directing the server to obtain the document from the document repository rather than the document'"'"'s web host.

283 Citations

6 Claims

1. A computer-implemented method for accessing a document, comprising:
- identifying a set of documents from a database, the set of documents stored in a search engine repository containing documents obtained by a network crawler system, the database including information about each document in the set of documents;
  
  identifying documents in the set of documents that satisfy predefined criteria, the predefined criteria including a requirement that document content in the search engine repository for the identified documents has remained unchanged over multiple downloads by the network crawler system; and
  
  inserting, for each respective identified document, a respective entry in a cache index indicating that the document content of the respective identified document is suitable for retrieval from the search engine repository and for serving to respective clients, wherein each respective identified document has a respective URL and wherein the search engine repository is at a location that is independent of the respective URLs of the identified documents;
  
  receiving at a document server a request from a client, the request identifying a URL of a document;
  
  identifying at the document server a first document copy corresponding to the identified URL;
  
  determining whether the first document copy is stale;
  
  when the first document copy is determined not to be stale, serving to the client the first document copy from the document server;
  
  when the first document copy at the document server is determined to be stale, and a first condition is satisfied, the first condition including a requirement that the cache index include an entry for the identified URL that indicates that a repository copy of the document in the search engine repository is suitable for retrieval from the search engine repository and for serving to respective clients,retrieving the repository copy of the document from the search engine repository; and
  
  when the first document copy at the document server is determined to be stale, and the first condition is not satisfied,retrieving a host copy of the document.
- View Dependent Claims (2)
- - 2. The computer-implemented method of claim 1, including, when the first document copy at the document server is determined to be stale, and the first condition is satisfied, serving the repository copy of the document to the client.

3. A computer readable storage medium storing one or more computer programs to be executed by one or more computers, the one or more computer programs comprising:
- instructions for storing in a database information about a set of documents, the set of documents stored in a search engine repository containing documents obtained by a network crawler system;
  
  instructions for identifying documents in the set of documents that satisfy predefined criteria, the predefined criteria including a requirement that document content in the search engine repository for the identified documents has remained unchanged over multiple downloads by the network crawler system; and
  
  instructions for inserting, for each respective identified documents, a respective entry in a cache index indicating that the document content of the respective identified document is suitable for retrieval from the search engine repository and for serving to respective clients, wherein each respective identified document has a respective URL, and wherein the search engine repository is at a location that is independent of the respective URLs of the identified documents;
  
  instructions for receiving at a document server a request from a client, the request identifying a URL of a document;
  
  instructions for identifying at the document server a first document copy corresponding to the identified URL;
  
  instructions for determining whether the first document copy is stale; and
  
  document retrieval instructions that,when the first document copy at the document server is determined not to be stale, serving to the client the first document copy from the document server;
  
  when the first document copy at the document server is determined to be stale, and a first condition is satisfied, the first condition including a requirement that the cache index include an entry for the identified URL that indicates that a repository copy of the document in the search engine repository is suitable for retrieval from the search engine repository and for serving to respective clients, retrieve the repository copy of the document from the search engine repository, andwhen the first document copy at the document server is determined to be stale, and the first condition is not satisfied, retrieve a host copy of the document.
- View Dependent Claims (4)
- - 4. The computer readable storage medium of claim 3, the one or more computer programs including document serving instructions for serving the repository copy of the document to the client when the first document copy at the document server is determined to be stale, and the first condition is satisfied.

5. A system for accessing a document comprising at least one computer, the at least one computer including:
- a database containing information about each document in a set of documents stored in a search engine repository, the search engine repository containing documents obtained by a network crawler system;
  
  an identification unit that identifies documents in the set of documents that satisfy predefined criteria, the predefined criteria including a requirement that document content in the search engine repository for the identified documents has remained unchanged over multiple downloads by the network crawler system;
  
  an insertion unit that inserts, for each respective identified documents, a respective entry in a cache index indicating that the document content of the respective identified document is suitable for retrieval from the search engine repository and for serving to respective clients, each respective identified documents having a respective URL, the search engine repository at a location that is independent of the respective URLs of the identified documents;
  
  a receiving unit that receives at a document server a request from a client, the request identifying a URL of a document;
  
  a lookup module that identifies at the document server a first document copy corresponding to the identified URL, and determines whether the first document copy is stale; and
  
  a document retrieving unit that,when the first document copy at the document server is determined not to be stale, serving to the client the first document copy from the document server;
  
  when the first document copy at the document server is determined to be stale, and a first condition is satisfied, the first condition including a requirement that the cache index include an entry for the identified URL that indicates that a repository copy of the document in the search engine repository is suitable for retrieval from the search engine repository and for serving to respective clients, retrieves the repository copy of the document from the search engine repository, andwhen the first document copy at the document server is determined to be stale, and the first condition is not satisfied, retrieves a host copy of the document.
- View Dependent Claims (6)
- - 6. The system of claim 5, including a document serving unit that serves the repository copy of the document to the client when the first document copy at the document server is determined to be stale, and the first condition is satisfied.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Ghemawat, Sanjay, Harik, Georges, Provos, Niels, Fredricksen, Eric Russell, Schneider, Fritz John, Dean, Jeffrey Adgate
Primary Examiner(s)
Rones; Charles
Assistant Examiner(s)
CAO, PHUONG THAO

Application Number

US10/882,795
Time in Patent Office

1,567 Days
Field of Search

707/1, 707/2, 707/10, 707/200, 709/201, 709/203
US Class Current

1/1
CPC Class Codes

G06F 16/93   Document management systems

G06F 16/9574   of access to content, e.g. ...

Y10S 707/99931   Database or file accessing

Y10S 707/99932   Access augmentation or opti...

System and method of accessing a document efficiently through multi-tier web caching

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

283 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

System and method of accessing a document efficiently through multi-tier web caching

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

283 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links