Document grouping system
First Claim
Patent Images
1. A computer-implemented system for grouping documents, the system comprising:
- a non-transitory document storage system comprising computer memory configured to store a plurality of documents, wherein each document of the plurality of documents comprises distinct character types;
a computerized matching unit comprising one or more hardware processors,wherein the computerized matching unit is configured to access the non-transitory document storage system and generate;
a first indicator of a common character count between a first document of the plurality of documents and a second document of the plurality of documents, wherein the common character count corresponds to a number of distinct character type occurrences in both the first document and the second document;
a second indicator of a character variance count between the first document and the second document, wherein the character variance count corresponds to differences in a number of occurrences of distinct character types in both the first document and the second document;
a third indicator of a missing character count between the first document and the second document, wherein the missing character count corresponds to a number of distinct character type occurrences in the first document and not in the second document;
a single indicator by combining at least the first indicator, the second indicator, and the third indicator, wherein the computerized matching unit is configured to compare the single indicator to a threshold indicator to determine whether there is a match between the first document and the second document; and
a grouping based at least in part on the single indicator; and
a matching reporting unit configured to report the grouping generated by the computerized matching unit to a user.
5 Assignments
0 Petitions
Accused Products
Abstract
Computer-based techniques for grouping documents are described herein. Documents may be grouped, organized, named, and/or indexed by their document character features. Document character features may comprise character counts, character difference counts, missing character counts, and any combination thereof. The comparison of documents may use a comparison threshold value for grouping documents. Documents may be processed in any language.
39 Citations
20 Claims
-
1. A computer-implemented system for grouping documents, the system comprising:
-
a non-transitory document storage system comprising computer memory configured to store a plurality of documents, wherein each document of the plurality of documents comprises distinct character types; a computerized matching unit comprising one or more hardware processors, wherein the computerized matching unit is configured to access the non-transitory document storage system and generate; a first indicator of a common character count between a first document of the plurality of documents and a second document of the plurality of documents, wherein the common character count corresponds to a number of distinct character type occurrences in both the first document and the second document; a second indicator of a character variance count between the first document and the second document, wherein the character variance count corresponds to differences in a number of occurrences of distinct character types in both the first document and the second document; a third indicator of a missing character count between the first document and the second document, wherein the missing character count corresponds to a number of distinct character type occurrences in the first document and not in the second document; a single indicator by combining at least the first indicator, the second indicator, and the third indicator, wherein the computerized matching unit is configured to compare the single indicator to a threshold indicator to determine whether there is a match between the first document and the second document; and a grouping based at least in part on the single indicator; and a matching reporting unit configured to report the grouping generated by the computerized matching unit to a user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for grouping documents, the system comprising:
-
one or more computing devices comprising one or more hardware processors, programmed, via executable code instructions, to implement; a matching unit for comparing a plurality of documents, the matching unit configured to receive as input a first document and a second document, wherein each document of the plurality of documents comprises distinct character types, and wherein the matching unit is further configured to generate; a first indicator of a common character count between the first document and the second document, wherein the common character count corresponds to a number of distinct character type occurrences in both the first document and the second document; a second indicator of a character variance count between the first document and the second document, wherein the character variance count corresponds to differences in a number of occurrences of distinct character types in both the first document and the second document; a single indicator by combining at least the first indicator and the second indicator, wherein the matching unit is configured to compare the single indicator to a threshold indicator to determine whether there is a match between the first document and the second document; and a grouping based at least in part on the single indicator; and a matching reporting unit adapted to report the grouping generated by the matching unit. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. Non-transitory computer storage comprising instructions for causing a computer system to group documents by a process that comprises:
-
receiving a first document and a second document, wherein each of the first document and the second document comprises distinct character types; generating a first indicator of a common character count between the first document and the second document, wherein the common character count corresponds to a number of distinct character type occurrences in both the first document and the second document; generating a second indicator of a character variance count between the first document and the second document, wherein the character variance count corresponds to differences in a number of occurrences of distinct character types in both the first document and the second document; generating a third indicator of a missing character count between the first document and the second document, wherein the missing character count corresponds to a number of distinct character type occurrences in the first document and not in the second document; generating a single indicator by combining at least the first indicator, the second indicator, and the third indicator; comparing the single indicator to a threshold indicator to determine whether there is a match between the first document and the second document; and transmitting matching information indicative of the comparison of the single indicator to the threshold indicator over a network to a computing device. - View Dependent Claims (18, 19, 20)
-
Specification