Proxy server using a statistical model
First Claim
1. A computer-readable medium storing computer-executable instructions for retrieving one document in a plurality of documents from either a remote server or a local cache, the one document having been previously retrieved from the remote server and a copy thereof stored in the cache, that, when executed, comprise:
- Maintaining historical information representing prior changes to the one document;
the historical information representing prior changes to the one document comprises for one document, a change count representing the number of times the one document has been modified, an access count representing the number of times the one document has been retrieved from the remote server, a first access time representing the time the one document was first retrieved from the remote server, and a last access time representing the time the one document was last retrieved from the remote server;
Initiating a document retrieval request procedure for retrieving particular documents in the plurality of documents;
Determining whether to access the one document from the remote server or the local cache, wherein the determination is based on a probabilistic analysis of the historical information representing prior changes to the one document, wherein the probabilistic analysis comprises;
Computing a probability that the one document has changed since the one document was last retrieved from the remote server, the probability that the one document has changed since the document was last retrieved from the remote server being computed without examining the one document andBeginning with a probability that a pre-defined proportion of documents in the plurality of documents has changed, training the probability that the pre-defined proportion of documents has changed using the historical information representing prior changes to the one document to achieve the probability that the one document has changed;
Wherein the step of training the probability comprises;
creating a timeline using the historical information, the timeline having representations thereon of no change intervals, change intervals, and no change chunk intervals;
Training the document probability distribution for each no change interval;
Training the document probability distribution for each change interval; and
Training the document probability distribution for each no change chunk interval.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer based system and method of determining whether to re-fetch a previously retrieved document across a computer network is disclosed. The method utilizes a statistical model to determine whether the previously retrieved document likely changed since last accessed. The statistical model is continuously improving its accuracy by training internal probability distributions to reflect the actual experience with change rate patterns of the documents accessed. The decision of whether to access the document is based on the probability of change compared against a desired synchronization level, random selections, maximum limits on the amount of time since the document was last accessed, and other criterion. Once the decision to access is made, the document is checked for changes and this information is used to train the statistical model.
187 Citations
21 Claims
-
1. A computer-readable medium storing computer-executable instructions for retrieving one document in a plurality of documents from either a remote server or a local cache, the one document having been previously retrieved from the remote server and a copy thereof stored in the cache, that, when executed, comprise:
-
Maintaining historical information representing prior changes to the one document;
the historical information representing prior changes to the one document comprises for one document, a change count representing the number of times the one document has been modified, an access count representing the number of times the one document has been retrieved from the remote server, a first access time representing the time the one document was first retrieved from the remote server, and a last access time representing the time the one document was last retrieved from the remote server;Initiating a document retrieval request procedure for retrieving particular documents in the plurality of documents; Determining whether to access the one document from the remote server or the local cache, wherein the determination is based on a probabilistic analysis of the historical information representing prior changes to the one document, wherein the probabilistic analysis comprises;
Computing a probability that the one document has changed since the one document was last retrieved from the remote server, the probability that the one document has changed since the document was last retrieved from the remote server being computed without examining the one document andBeginning with a probability that a pre-defined proportion of documents in the plurality of documents has changed, training the probability that the pre-defined proportion of documents has changed using the historical information representing prior changes to the one document to achieve the probability that the one document has changed;
Wherein the step of training the probability comprises;
creating a timeline using the historical information, the timeline having representations thereon of no change intervals, change intervals, and no change chunk intervals;
Training the document probability distribution for each no change interval;
Training the document probability distribution for each change interval; and
Training the document probability distribution for each no change chunk interval. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method for retrieving one document in a plurality of documents from either a remote server or a local cache, the one document having been previously retrieved from the remote server and a copy thereof stored in the cache, the method comprising:
-
Maintaining historical information representing prior changes to the one document;
the historical information representing prior changes to the one document comprises for one document, a change count representing the number of times the one document has been modified, an access count representing the number of times the one document has been retrieved from the remote server, a first access time representing the time the one document was first retrieved from the remote server, and a last access time representing the time the one document was last retrieved from the remote server;
Initiating a document retrieval request procedure for retrieving particular documents in the plurality of documents; andDetermining whether to access the one document from the remote server or the local cache, wherein the determination is based on a probabilistic analysis of the historical information representing prior changes to the one document, wherein the probabilistic analysis comprises;
Computing a probability that the one document has changed since the one document was last retrieved from the remote server, the probability that the one document has changed since the document was last retrieved from the remote server being computed without examining the one document andBeginning with a probability that a pre-defined proportion of documents in the plurality of documents has changed, training the probability that the pre-defined proportion of documents has changed using the historical information representing prior changes to the one document to achieve the probability that the one document has changed;
Wherein the step of training the probability comprises;
creating a timeline using the historical information, the timeline having representations thereon of no change intervals, change intervals, and no change chunk intervals;
Training the document probability distribution for each no change interval;
Training the document probability distribution for each change interval; and
Training the document probability distribution for each no change chunk interval. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system for retrieving one document in a plurality of documents from either a remote server or a local cache, the one document having been previously retrieved from the remote server and a copy thereof stored in the cache, the system comprising:
-
a processor; and
a memory having computer-executable instruction stored thereon, the computer-executable instructions including instructions for;Maintaining historical information representing prior changes to the one document;
the historical information representing prior changes to the one document comprises for one document, a change count representing the number of times the one document has been modified, an access count representing the number of times the one document has been retrieved from the remote server, a first access time representing the time the one document was first retrieved from the remote server, and a last access time representing the time the one document was last retrieved from the remote server;Initiating a document retrieval request procedure for retrieving particular documents in the plurality of documents; and Determining whether to access the one document from the remote server or the local cache, wherein the determination is based on a probabilistic analysis of the historical information representing prior changes to the one document, wherein the probabilistic analysis;
comprises;
Computing a probability that the one document has changed since the one document was last retrieved from the remote server, the probability that the one document has changed since the document was last retrieved from the remote server being computed without examining the one document andBeginning with a probability that a pre-defined proportion of documents in the plurality of documents has changed, training the probability that the pre-defined proportion of documents has changed using the historical information representing prior changes to the one document to achieve the probability that the one document has changed;
Wherein the step of training the probability comprises;
creating a timeline using the historical information, the timeline having representations thereon of no change intervals, change intervals, and no change chunk intervals;
Training the document probability distribution for each no change interval;
Training the document probability distribution for each change interval; and
Training the document probability distribution for each no change chunk interval. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification