Web Server for Multi-Version Web Documents
First Claim
1. A method of comparing digital documents,the method comprising the steps of:
- making fingerprints of the documents to be compared; and
comparing the fingerprints to determine a similarity value indicating a degree of similarity of the documents being compared, the similarity value being capable of expressing degrees of similarity in addition to identical or not identical.
1 Assignment
0 Petitions
Accused Products
Abstract
A repository server that provides stored copies of Web-accessible documents A client of the repository server may register a document in the repository server. The repository server makes a copy of the registered document and returns a repository URL for the copy to the client. The repository URL may be used to fetch the copy from the repository URL. Registration further relates the stored copy to its document URL, to an identifier for the stored copy, to a fingerprint that is a condensed representation of the stored copy'"'"'s content and can be used to determine degrees of similarity other than match-no match, and to a set of stored copies having similar content. The fingerprints are used to compute similarity. The similarity computation further employs comparisons of links in the documents and of document URLS to determine whether it is necessary to use the fingerprints to compute similarity.
127 Citations
21 Claims
-
1. A method of comparing digital documents,
the method comprising the steps of: -
making fingerprints of the documents to be compared; and comparing the fingerprints to determine a similarity value indicating a degree of similarity of the documents being compared, the similarity value being capable of expressing degrees of similarity in addition to identical or not identical. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of making a fingerprint for a digital document that contains both structural information and content information,
the method comprising the steps of: -
encoding the structural information using a first encoding that preserves semantic information about the structural information; and encoding the content information using a second encoding. - View Dependent Claims (12, 13, 14)
-
-
15. A method of comparing a first fingerprint for a first digital document with a second fingerprint for a second digital document, the fingerprints containing both structural and content information from the digital documents and being made by encoding the structural information using a first encoding that preserves semantic information about the structural information and encoding the content information using a second encoding and
the method comprising the steps of: -
1. finding an encoding of structural information in the first fingerprint; 2. finding a substring in the second fingerprint that matches a substring in the first fingerprint that begins at the found encoding; 3. adding the length of the found substring to a running length total; 4. finding another encoding of structural information in the first fingerprint that is not contained in any found substring and repeating steps 1-3 with the other encoding; 5. repeating steps 1-4 until no further encodings of structural information can be found in step 4; and 6. using the length of one of the fingerprints and the running length total to compute a similarity value. - View Dependent Claims (16)
-
-
17. A fingerprint for a digital document, the digital document including both structural information and content information and the fingerprint comprising:
-
a structural encoding portion in which a first encoding preserves semantic information about components of the digital document'"'"'s structural information; and a content encoding portion in which a second encoding preserves content information about components of the digital document'"'"'s content, whereby a degree of similarity other than identity or lack thereof between two digital documents is determinable by comparing the fingerprints belonging to the digital documents. - View Dependent Claims (18, 19, 20, 21)
-
Specification