Method for estimating the probability of collisions of fingerprints
First Claim
1. A computer implemented method of detecting near-collisions of fingerprints of strings, comprising repeatedly performing the steps ofreceiving a string;
- applying a one way function to the string to generate a fingerprint;
comparing the generated fingerprint with a set of fingerprints for previously processed strings to generate a comparison result;
processing the received string in accordance with the comparison result;
masking the generated fingerprint to generated a masked fingerprint, the masked fingerprint having an unmasked portion;
detecting near collisions of the generated fingerprint with the fingerprints for previously processed strings by comparing the unmasked portion of the fingerprint with a corresponding portion of the fingerprints for previously processed strings and storing near collision information for each fingerprint of a previously processed string that is not identical to the generated fingerprint and that matches the unmasked portion of the generated fingerprint.
10 Assignments
0 Petitions
Accused Products
Abstract
Strings, such as Web pages or other documents, are fingerprinted in order to detect substantially similar strings, so as to avoid processing duplicate strings. At the same time determine a computerized method estimates the probability that a collision among fingerprints of dissimilar strings. As fingerprints are generated for strings presented for processing, when the fingerprint of a string is determined not to be identical to any fingerprint in a set of stored fingerprints, the new fingerprint is masked and the unmasked portion of the fingerprint is compared with a corresponding portion of the fingerprints in the stored set. Information is recorded regarding the number of matching masked fingerprints.
32 Citations
8 Claims
-
1. A computer implemented method of detecting near-collisions of fingerprints of strings, comprising repeatedly performing the steps of
receiving a string; -
applying a one way function to the string to generate a fingerprint; comparing the generated fingerprint with a set of fingerprints for previously processed strings to generate a comparison result; processing the received string in accordance with the comparison result; masking the generated fingerprint to generated a masked fingerprint, the masked fingerprint having an unmasked portion; detecting near collisions of the generated fingerprint with the fingerprints for previously processed strings by comparing the unmasked portion of the fingerprint with a corresponding portion of the fingerprints for previously processed strings and storing near collision information for each fingerprint of a previously processed string that is not identical to the generated fingerprint and that matches the unmasked portion of the generated fingerprint. - View Dependent Claims (2, 3, 4)
-
-
5. A computer program product for use in conjunction with a computer system, the computer program product comprising a computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism comprising:
-
a fingerprinting module for applying a one way function to a string to generate a fingerprint; and a fingerprint processing module, including; a first comparison mechanism that compares the generated fingerprint with a set of fingerprints for previously processed strings to generate a comparison result, and directs subsequent processing of the string in accordance with the comparison result; and a second comparison mechanism that; masks the generated fingerprint to generated a masked fingerprint, the masked fingerprint having an unmasked portion; detects near collisions of the generated fingerprint with the fingerprints for previously processed strings by comparing the unmasked portion of the fingerprint with a corresponding portion of the fingerprints for previously processed strings; and stores near collision information for each fingerprint of a previously processed string that is not identical to the generated fingerprint and that matches the unmasked portion of the generated fingerprint. - View Dependent Claims (6, 7, 8)
-
Specification