System and method of accessing a document efficiently through multi-tier web caching

US 8,275,790 B2
Filed: 10/14/2008
Issued: 09/25/2012
Est. Priority Date: 06/30/2004
Status: Active Grant

First Claim

Patent Images

1. A method of accessing a document, comprising:

receiving at a document server a request from a client, the request identifying a URL of a document;

identifying at the document server a first document copy corresponding to the URL, and determining whether the first document copy is stale;

when the first document copy at the document server is determined to be stale, determining at the document server whether a first condition is satisfied using document freshness information stored at the document server;

the first condition including a freshness condition with respect to a repository copy of the document in a search engine repository; and

when the first document copy at the document server is determined to be stale and the first condition is satisfied,retrieving the repository copy of the document from the search engine repository; and

when the first document copy at the document server is determined to be stale, and the first condition is not satisfied,retrieving a host copy of the document;

wherein the search engine repository is at a location that is independent of the URL and distinct from the document server; and

wherein the search engine repository contains documents, including the repository copy of the document, obtained by a network crawler system.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Upon receipt of a document request, a client assistant examines its cache for the document. If not successful, a server searches for the requested document in its cache. If the server copy is still not fresh or not found, the server seeks the document from its host. If the host cannot provide the copy, the server seeks it from a document repository. Certain documents are identified from the document repository as being fresh or stable. Information about each these identified documents is transmitted to the server which inserts entries into an index if the index does not already contain an entry for the document. If and when this particular document is requested, the document will not be present in the server, however the server will contain an entry directing the server to obtain the document from the document repository rather than the document'"'"'s web host.

225 Citations

42 Claims

1. A method of accessing a document, comprising:
- receiving at a document server a request from a client, the request identifying a URL of a document;
  
  identifying at the document server a first document copy corresponding to the URL, and determining whether the first document copy is stale;
  
  when the first document copy at the document server is determined to be stale, determining at the document server whether a first condition is satisfied using document freshness information stored at the document server;
  
  the first condition including a freshness condition with respect to a repository copy of the document in a search engine repository; and
  
  when the first document copy at the document server is determined to be stale and the first condition is satisfied,retrieving the repository copy of the document from the search engine repository; and
  
  when the first document copy at the document server is determined to be stale, and the first condition is not satisfied,retrieving a host copy of the document;
  
  wherein the search engine repository is at a location that is independent of the URL and distinct from the document server; and
  
  wherein the search engine repository contains documents, including the repository copy of the document, obtained by a network crawler system.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, including serving to the client the first document copy from the document server when the first document copy is determined not to be stale.
  - 3. The method of claim 1, wherein the first condition includes an availability condition with respect to a host associated with the URL.
  - 4. The method of claim 1, including serving the repository copy of the document to the client when the first document copy at the document server is determined to be stale and the first condition is satisfied.
  - 5. The method of claim 1, wherein the determining whether the first document copy is stale includes determining whether a freshness period will be exceeded within a predetermined time.

6. A method of accessing a document, comprising:
- receiving at a document server a request from a client, the request identifying a URL of a document;
  
  identifying at the document server a first document copy corresponding to the URL, and determining whether the first document copy is stale;
  
  when the first document copy at the document server is determined to be stale, and a first condition is satisfied, the first condition including a freshness condition with respect to a repository copy of the document in a search engine repository and a stability condition with respect to the repository copy of the document, wherein the stability condition includes a condition that content of the document has remained unchanged over multiple downloads by a network crawler system,retrieving the repository copy of the document from the search engine repository; and
  
  when the first document copy at the document server is determined to be stale, and the first condition is not satisfied,retrieving a host copy of the document;
  
  whereinthe search engine repository is at a location that is independent of the URL and distinct from the document server; and
  
  wherein the search engine repository contains documents, including the repository copy of the document, obtained by the network crawler system;
  
  satisfying the first condition includes satisfying both the freshness condition and the stability condition; and
  
  determining whether the document is stale and the first condition is satisfied is performed at the document server.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The method of claim 6, including serving to the client the first document copy from the document server when the first document copy is determined not to be stale.
  - 8. The method of claim 6, wherein the first condition includes an availability condition with respect to a host associated with the URL.
  - 9. The method of claim 6, including serving the repository copy of the document to the client when the first document copy at the document server is determined to be stale and the first condition is satisfied.
  - 10. The method of claim 6, wherein the determining whether the first document copy is stale includes determining whether a freshness period will be exceeded within a predetermined time.

11. A system for accessing a document comprising at least one computer, the at least one computer including:
- a receiving unit that receives at a document server a request from a client, the request identifying a URL of a document;
  
  a lookup module that identifies at the document server a first document copy corresponding to the URL, and determines whether the first document copy is stale;
  
  instructions for execution by the document server that determine whether a first condition is satisfied using document freshness information stored at the document server when the first document copy at the document server is determined to be stale, wherein the first condition includes a freshness condition with respect to a repository copy of the document in a search engine repository; and
  
  a document retrieving unit that (A) retrieves the repository copy of the document from the search engine repository when the first document copy is determined to be stale and the first condition applies, and (B) retrieves a host copy of the document when the first document copy is determined to be stale and the first condition is not satisfied;
  
  wherein the search engine repository is at a location that is independent of the URL and distinct from the document server; and
  
  wherein the search engine repository contains documents, including the repository copy of the document, obtained by a network crawler system.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The system of claim 11, further comprising a document serving unit that serves the first document copy from the document server to the client when the first document copy is determined not to be stale.
  - 13. The system of claim 11, wherein the first condition includes an availability condition with respect to a host associated with the URL.
  - 14. The system of claim 11, including a document serving unit that serves the repository copy of the document to the client when the first document copy at the document server is determined to be stale and the first condition is satisfied.
  - 15. The system of claim 11, wherein the lookup module determines that the first document copy is stale based at least in part on determining whether a freshness period will be exceeded within a predetermined time.

16. A system for accessing a document comprising at least one computer, the at least one computer including:
- a receiving unit that receives at a document server a request from a client, the request identifying a URL of a document;
  
  a lookup module that identifies at the document server a first document copy corresponding to the URL, and determines whether the first document copy is stale; and
  
  a document retrieving unit that (A) retrieves a repository copy of the document from a search engine repository when the first document copy is determined to be stale and a first condition is satisfied, a first condition including a freshness condition with respect to the repository copy of the document in the search engine repository and a stability condition with respect to the repository copy of the document, wherein the stability condition includes a condition that content of the document has remained unchanged over multiple downloads by the network crawler system, and (B) retrieves a host copy of the document when the first document copy is determined to be stale and the first condition is not satisfied;
  
  whereinthe search engine repository is at a location that is independent of the URL and distinct from the document server; and
  
  wherein the search engine repository contains documents, including the repository copy of the document, obtained by a network crawler system;
  
  satisfying the first condition includes satisfying both the freshness condition and the stability condition; and
  
  determining whether the document is stale and the first condition is satisfied is performed at the document server.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The system of claim 16, further comprising a document serving unit that serves the first document copy from the document server to the client when the first document copy is determined not to be stale.
  - 18. The system of claim 16, wherein the first condition includes an availability condition with respect to a host associated with the URL.
  - 19. The system of claim 16, including a document serving unit that serves the repository copy of the document to the client when the first document copy at the document server is determined to be stale and the first condition is satisfied.
  - 20. The system of claim 16, wherein the lookup module determines that the first document copy is stale based at least in part on determining whether a freshness period will be exceeded within a predetermined time.

21. A non-transitory computer readable storage medium storing one or more computer programs, the one or more computer programs comprising:
- instructions for receiving at a document server a request from a client, the request identifying a URL of a document;
  
  instructions for identifying at the document server a first document copy corresponding to the URL, and determining whether the first document copy is stale;
  
  instructions for determining, at the document server, whether a first condition is satisfied using document freshness information stored at the document server when the first document copy at the document server is determined to be stale, wherein the first condition includes a freshness condition with respect to a repository copy of the document in a search engine repository; and
  
  instructions for retrieving the repository copy of the document from the search engine repository when the first document copy at the document server is determined to be stale and the first condition applies; and
  
  instructions for retrieving a host copy of the document when the first document copy at the document server is determined to be stale and the first condition is not satisfied,wherein the search engine repository is at a location that is independent of the URL and distinct from the document server; and
  
  wherein the search engine repository contains documents, including the repository copy of the document, obtained by a network crawler system.
- View Dependent Claims (22, 23, 24, 25)
- - 22. The non-transitory computer readable storage medium of claim 21, including instructions for serving to the client the first document copy from the document server when the first document copy is determined not to be stale.
  - 23. The non-transitory computer readable storage medium of claim 21, wherein the first condition includes an availability condition with respect to a host associated with the URL.
  - 24. The non-transitory computer readable storage medium of claim 21, including instructions for serving the repository copy of the document to the client when the first document copy at the document server is determined to be stale and the first condition is satisfied.
  - 25. The non-transitory computer readable storage medium of claim 21, wherein the instructionsfor determining whether the first document copy is stale includes instructions for determining whether a freshness period will be exceeded within a predetermined time.

26. A non-transitory computer readable storage medium storing one or more computer programs, the one or more computer programs comprising:
- instructions for receiving at a document server a request from a client, the request identifying a URL of a document;
  
  instructions for identifying at the document server a first document copy corresponding to the URL, and determining whether the first document copy is stale;
  
  instructions for retrieving a repository copy of the document from a search engine repository when the first document copy at the document server is determined to be stale and a first condition is satisfied, the first condition including a freshness condition with respect to the repository copy of the document and a stability condition with respect to the repository copy of the document, wherein the stability condition includes a condition that content of the document has remained unchanged over multiple downloads by a network crawler system; and
  
  instructions for retrieving a host copy of the document when the first document copy at the document server is determined to be stale and the first condition is not satisfied,whereinthe search engine repository is at a location that is independent of the URL and distinct from the document server; and
  
  wherein the search engine repository contains documents, including the repository copy of the document, obtained by the network crawler system;
  
  satisfying the first condition includes satisfying both the freshness condition and the stability condition; and
  
  determining whether the document is stale and the first condition is satisfied is performed at the document server.
- View Dependent Claims (27, 28, 29, 30)
- - 27. The non-transitory computer readable storage medium of claim 26, including instructions for serving to the client the first document copy from the document server when the first document copy is determined not to be stale.
  - 28. The non-transitory computer readable storage medium of claim 26, wherein the first condition includes an availability condition with respect to a host associated with the URL.
  - 29. The non-transitory computer readable storage medium of claim 26, including instructions for serving the repository copy of the document to the client when the first document copy at the document server is determined to be stale and the first condition is satisfied.
  - 30. The non-transitory computer readable storage medium of claim 26, wherein the instructions for determining whether the first document copy is stale includes instructions for determining whether a freshness period will be exceeded within a predetermined time.

31. A method for updating a cache index located at adocument server distinct from a search engine and search engine repository, comprising:
- identifying a set of documents from a database, the set of documents stored in a search engine repository containing documents obtained by a network crawler system, the database including information about each document in the set of documents;
  
  identifying documents in the set of documents that satisfy predefined criteria, the predefined criteria including a requirement that document content in the search engine repository for the identified documents has remained unchanged over multiple downloads by the network crawler system; and
  
  at the document server, inserting, for each respective identified document, a respective entry in the cache index indicating that the document content of the respective identified document is suited for retrieval from the search engine repository and for serving to respective clients, wherein each respective identified document has a respective URL and wherein the search engine repository is at a location that is independent of the respective URLs of the identified documents.
- View Dependent Claims (32, 33)
- - 32. The method of claim 31, the method including:
    - receiving at the document server a request from a client, the request identifying a URL of a document;
      
      upon satisfaction of predefined conditions, including a requirement that the cache index includes an entry for the identified URL that indicates that a repository copy of the document in the search engine repository is suited for retrieval from the search engine repository and for serving to respective clients, retrieving the repository copy of the document from the search engine repository.
  - 33. The method of claim 32, including serving the repository copy of the document to the client.

34. A server system, comprising:
- one or more processors;
  
  memory storing one or more computer programs, the one or more computer programs comprising;
  
  instructions for receiving at the server system a request from a client, the request identifying a URL of a document;
  
  instructions for identifying at the server system a first document copy corresponding to the URL, and determining whether the first document copy is stale;
  
  instructions for determining at the server system whether a first condition is satisfied using document freshness information stored at the server system when the first document copy at the server system is determined to be stale;
  
  the first condition including a freshness condition with respect to a repository copy of the document in a search engine repository; and
  
  instructions for retrieving the repository copy of the document from the search engine repository when the first document copy at the server system is determined to be stale and a first condition is satisfied; and
  
  instructions for retrieving a host copy of the document when the first document copy at the server system is determined to be stale and the first condition is not satisfied,wherein the search engine repository is at a location that is independent of the URL and distinct from the server system; and
  
  wherein the search engine repository contains documents, including the repository copy of the document, obtained by a network crawler system.
- View Dependent Claims (35, 36, 37, 38)
- - 35. The server system of claim 34, wherein the one or more computer programs include instructions for serving to the client the first document copy from the server system when the first document copy is determined not to be stale.
  - 36. The server system of claim 34, wherein the first condition includes an availability condition with respect to a host associated with the URL.
  - 37. The server system of claim 34, wherein the one or more computer programs include instructions for serving the repository copy of the document to the client when the first document copy at the server system is determined to be stale and the first condition is satisfied.
  - 38. The server system of claim 34, wherein the instructions for determining whether the first document copy is stale include instructions for determining whether a freshness period will be exceeded within a predetermined time.

39. A server system, comprising:
- one or more processors;
  
  memory storing one or more computer programs, the one or more computer programs comprising;
  
  instructions for receiving at the server system a request from a client, the request identifying a URL of a document;
  
  instructions for identifying at the server system a first document copy corresponding to the URL, and determining whether the first document copy is stale;
  
  instructions for retrieving a repository copy of the document from a search engine repository when the first document copy at the server system is determined to be stale, and a first condition is satisfied, the first condition including a freshness condition with respect to the repository copy of the document in the search engine repository and a stability condition with respect to the repository copy of the document and the stability condition includes a condition that content of the document has remained unchanged over multiple downloads by a network crawler system;
  
  instructions for retrieving a host copy of the document when the first document copy at the server system is determined to be stale, and the first condition is not satisfied; and
  
  whereinthe search engine repository is at a location that is independent of the URL and distinct from the server system; and
  
  wherein the search engine repository contains documents, including the repository copy of the document, obtained by the network crawler system;
  
  satisfying the first condition includes satisfying both the freshness condition and the stability condition; and
  
  determining whether the document is stale and the first condition is satisfied is performed at the server system.
- View Dependent Claims (40, 41, 42)
- - 40. The server system of claim 39, wherein the one or more computer programs further comprise instructions for serving to the client the first document copy from the server system when the first document copy is determined not to be stale.
  - 41. The server system of claim 39, wherein the first condition includes an availability condition with respect to a host associated with the URL.
  - 42. The server system of claim 39, wherein the one or more computer programs further comprise instructions for serving the repository copy of the document to the client when the first document copy at the server system is determined to be stale and the first condition is satisfied.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Fredricksen, Eric Russell, Schneider, Fritz John, Dean, Jeffrey Adgate, Ghemawat, Sanjay, Provos, Niels, Harik, Georges
Primary Examiner(s)
Cao, Phuong Thao

Application Number

US12/251,413
Publication Number

US 20090037393A1
Time in Patent Office

1,442 Days
Field of Search

707/705, 707/999.003, 707/741, 707/782, 707/E17.12
US Class Current

707/782
CPC Class Codes

G06F 16/93   Document management systems

G06F 16/9574   of access to content, e.g. ...

Y10S 707/99931   Database or file accessing

Y10S 707/99932   Access augmentation or opti...

System and method of accessing a document efficiently through multi-tier web caching

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

225 Citations

42 Claims

Specification

Solutions

Use Cases

Quick Links

System and method of accessing a document efficiently through multi-tier web caching

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

225 Citations

42 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links