Method and apparatus for duplicate detection
First Claim
Patent Images
1. A computer readable storage medium, comprising executable instructions to:
- compute a first measure of similarity between a first document and a reference document, wherein the first measure of similarity is a scalar or a multi-dimensional indication of distance between the first document and the reference document;
compute a second measure of similarity between a second document and the reference document, wherein the second measure of similarity is a scalar or a multi-dimensional indication of distance between the second document and the reference document;
compare the first measure of similarity and the second measure of similarity through triangulation to identify a non-exact similarity match between the first document and the second document; and
perform a direct comparison of the first document and the second document in response to the identified non-exact similarity match to compute a third measure of similarity, wherein the third measure of similarity is a scalar or a multi-dimensional indication of distance between the first document and the second document, and the third measure of similarity has a finer granularity than the first measure of similarity and the second measure of similarity.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention includes a method and device for detecting duplicate documents by triangulation. Particular aspects of the present invention are described in the claims, specification and drawings.
22 Citations
10 Claims
-
1. A computer readable storage medium, comprising executable instructions to:
-
compute a first measure of similarity between a first document and a reference document, wherein the first measure of similarity is a scalar or a multi-dimensional indication of distance between the first document and the reference document; compute a second measure of similarity between a second document and the reference document, wherein the second measure of similarity is a scalar or a multi-dimensional indication of distance between the second document and the reference document; compare the first measure of similarity and the second measure of similarity through triangulation to identify a non-exact similarity match between the first document and the second document; and perform a direct comparison of the first document and the second document in response to the identified non-exact similarity match to compute a third measure of similarity, wherein the third measure of similarity is a scalar or a multi-dimensional indication of distance between the first document and the second document, and the third measure of similarity has a finer granularity than the first measure of similarity and the second measure of similarity. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
Specification