Duplicate document detection
First Claim
Patent Images
1. A method comprising:
- generating, using at least one processor, a primary lexicon of attributes and a secondary lexicon of attributes;
determining unique attributes in a document;
determining an intersection between the unique attributes in the document and the primary lexicon;
determining the intersection is below a predetermined threshold;
based on the intersection being below the predetermined threshold, applying the secondary lexicon to augment the intersection; and
calculating a set of identifiers for the document based on the augmented intersection.
5 Assignments
0 Petitions
Accused Products
Abstract
In a single-signature duplicate document system, a secondary set of attributes is used in addition to a primary set of attributes so as to improve the precision of the system. When the projection of a document onto the primary set of attributes is below a threshold, then a secondary set of attributes is used to supplement the primary lexicon so that the projection is above the threshold.
-
Citations
20 Claims
-
1. A method comprising:
-
generating, using at least one processor, a primary lexicon of attributes and a secondary lexicon of attributes; determining unique attributes in a document; determining an intersection between the unique attributes in the document and the primary lexicon; determining the intersection is below a predetermined threshold; based on the intersection being below the predetermined threshold, applying the secondary lexicon to augment the intersection; and calculating a set of identifiers for the document based on the augmented intersection. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system comprising:
-
at least one processor; and at least one non-transitory computer readable medium storing instructions thereon that, when executed by the at least one processor, cause the system to; generate a primary lexicon of attributes and a secondary lexicon of attributes; determine unique attributes in a document; determine an intersection between the unique attributes in the document and the primary lexicon; determine that the intersection is below a predetermined threshold; based on the intersection being below the predetermined threshold, apply the secondary lexicon to augment the intersection; and calculate a set of identifiers for the document based on the augmented intersection. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification