Web server for multi-version web documents
First Claim
1. A method of comparing digital documents, the method comprising:
- filtering documents to reduce a total number of documents for fingerprint comparison at least by;
performing a comparison of respective total numbers of links respectively included in a pair of documents in the documents and one or more websites pointed to by the respective total numbers of links based in part or in whole upon rates of change comprising a first rate of change and a second rate of change, wherein the first rate of change associated with at least one respective total number of links is slower than the second rate of change associated with contents of at least one document of the pair of documents in the documents, and the comparison is performed without comparing fingerprints of the pair of documents; and
determining whether or not to compare the fingerprints of the pair of documents based at least in part upon the rates of change and whether a degree of difference in the respective total numbers of links from the comparison falls within a predetermined range; and
in response to determining that the fingerprints should be compared, generating the fingerprints on determining the fingerprints are not existing, and comparing the fingerprints, at a computing system, to determine a similarity value indicating a degree of similarity of the pair of documents being compared, the similarity value expressing degrees of similarity in addition to whether the pair of documents being compared are identical or not identical.
1 Assignment
0 Petitions
Accused Products
Abstract
A repository server that provides stored copies of Web-accessible documents A client of the repository server may register a document in the repository server. The repository server makes a copy of the registered document and returns a repository URL for the copy to the client. The repository URL may be used to fetch the copy from the repository URL. Registration further relates the stored copy to its document URL, to an identifier for the stored copy, to a fingerprint that is a condensed representation of the stored copy'"'"'s content and can be used to determine degrees of similarity other than match-no match, and to a set of stored copies having similar content. The fingerprints are used to compute similarity. The similarity computation further employs comparisons of links in the documents and of document URLS to determine whether it is necessary to use the fingerprints to compute similarity.
-
Citations
24 Claims
-
1. A method of comparing digital documents, the method comprising:
-
filtering documents to reduce a total number of documents for fingerprint comparison at least by; performing a comparison of respective total numbers of links respectively included in a pair of documents in the documents and one or more websites pointed to by the respective total numbers of links based in part or in whole upon rates of change comprising a first rate of change and a second rate of change, wherein the first rate of change associated with at least one respective total number of links is slower than the second rate of change associated with contents of at least one document of the pair of documents in the documents, and the comparison is performed without comparing fingerprints of the pair of documents; and determining whether or not to compare the fingerprints of the pair of documents based at least in part upon the rates of change and whether a degree of difference in the respective total numbers of links from the comparison falls within a predetermined range; and in response to determining that the fingerprints should be compared, generating the fingerprints on determining the fingerprints are not existing, and comparing the fingerprints, at a computing system, to determine a similarity value indicating a degree of similarity of the pair of documents being compared, the similarity value expressing degrees of similarity in addition to whether the pair of documents being compared are identical or not identical. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer program product embodied on a non-transitory computer readable medium, the non-transitory computer readable medium having stored a sequence of instructions executed by a processor causes the process to perform a set of acts, the set of acts comprising:
-
filtering documents to reduce a total number of documents for fingerprint comparison at least by; performing a comparison of respective total numbers of links respectively included in a pair of documents in the documents and one or more websites pointed to by the respective total numbers of links based in part or in whole upon rates of change comprising a first rate of change and a second rate of change, wherein the first rate of change associated with at least one respective total number of links is slower than the second rate of change associated with contents of at least one document of the pair of documents in the documents, and the comparison is performed without comparing fingerprints of the pair of documents; and determining whether or not to compare the fingerprints of the pair of documents based at least in part upon the rates of change and whether a degree of difference in the respective total numbers of links from the comparison falls within a predetermined range; and in response to determining that the fingerprints should be compared, generating the fingerprints on determining the fingerprints are not existing, and comparing the fingerprints, at a computing system, to determine a similarity value indicating a degree of similarity of the pair of documents being compared, the similarity value expressing degrees of similarity in addition to whether the pair of documents being compared are identical or not identical. - View Dependent Claims (18, 19, 20)
-
-
21. An apparatus for comparing digital documents, the apparatus comprising:
-
a processor and a memory to store instructions, the instructions is executed by the processor to perform; filtering documents to reduce a total number of documents for fingerprint comparison at least by; performing a comparison of respective total numbers of links respectively included in a pair of documents in the documents and one or more websites pointed to by the respective total numbers of links based in part or in whole upon rates of change comprising a first rate of change and a second rate of change, wherein the first rate of change associated with at least one respective total number of links is slower than the second rate of change associated with contents of at least one document of the pair of documents in the documents, and the comparison is performed without comparing fingerprints of the pair of documents; and determining whether or not to compare the fingerprints of the pair of documents based at least in part upon the rates of change and whether a degree of difference in the respective total numbers of links from the comparison falls within a predetermined range; and in response to determining that the fingerprints should be compared, generating the fingerprints on determining the fingerprints are not existing, and comparing the fingerprints, at a computing system, to determine a similarity value indicating a degree of similarity of the pair of documents being compared, the similarity value expressing degrees of similarity in addition to whether the pair of documents being compared are identical or not identical. - View Dependent Claims (22, 23, 24)
-
Specification