Method and apparatus for a character-based comparison of documents
First Claim
Patent Images
1. A method comprising:
- dividing a first document into a plurality of tokens, each token including a predefined number of sequential characters from the first document;
calculating a plurality of hash values for the plurality of tokens; and
creating, for the first document, a signature including a subset of hash values from the plurality of hash values and additional information pertaining to the plurality of tokens of the first document, the signature of the first document being subsequently compared with a signature of a second document to determine resemblance between the first document and the second document.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for a character-based document comparison are described. In one embodiment, the method includes dividing a first document into tokens. Each token includes a predefined number of sequential characters from the first document. The method further includes calculating hash values for the tokens and creating, for the first document, a signature including a subset of hash values from the calculated hash values and additional information pertaining to the tokens of the first document. The signature of the first document is subsequently compared with a signature of a second document to determine resemblance between the first document and the second document.
123 Citations
27 Claims
-
1. A method comprising:
-
dividing a first document into a plurality of tokens, each token including a predefined number of sequential characters from the first document;
calculating a plurality of hash values for the plurality of tokens; and
creating, for the first document, a signature including a subset of hash values from the plurality of hash values and additional information pertaining to the plurality of tokens of the first document, the signature of the first document being subsequently compared with a signature of a second document to determine resemblance between the first document and the second document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system comprising:
-
a parser to divide a first document into a plurality of tokens, each token including a predefined number of sequential characters from the first document; and
a message data generator to calculate a plurality of hash values for the plurality of tokens, and to create, for the first document, a signature including a subset of hash values from the plurality of hash values and additional information pertaining to the plurality of tokens of the first document, the signature of the first document being subsequently compared with a signature of a second document to determine resemblance between the first document and the second document. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. An apparatus comprising:
-
means for dividing a first document into a plurality of tokens, each token including a predefined number of sequential characters from the first document;
means for calculating a plurality of hash values for the plurality of tokens; and
means for creating, for the first document, a signature including a subset of hash values from the plurality of hash values and additional information pertaining to the plurality of tokens of the first document, the signature of the first document being subsequently compared with a signature of a second document to determine resemblance between the first document and the second document. - View Dependent Claims (23, 24)
-
-
25. A computer readable medium comprising executable instructions which when executed on a processing system cause said processing system to perform a method comprising:
-
dividing a first document into a plurality of tokens, each token including a predefined number of sequential characters from the first document;
calculating a plurality of hash values for the plurality of tokens; and
creating, for the first document, a signature including a subset of hash values from the plurality of hash values and additional information pertaining to the plurality of tokens of the first document, the signature of the first document being subsequently compared with a signature of a second document to determine resemblance between the first document and the second document. - View Dependent Claims (26, 27)
-
Specification