Document classification and characterization
First Claim
1. A method comprising:
- receiving, by at least one data processor, data characterizing each of a plurality of documents within a document set;
grouping, by the at least one data processor, the plurality of documents into a plurality of stacks using one or more grouping algorithms;
identifying, by the at least one data processor, a prime document for each stack, the prime document including attributes representative of the entire stack;
providing, by the at least one data processor, data characterizing documents for each stack including at least the identified prime document to at least one human reviewer;
receiving, by the at least one data processor, user-generated input from the at least one human reviewer categorizing each provided document;
sending, by the at least one data processor, data characterizing supplemental documents within a stack other than the provided documents to enable the at least one human reviewer to review a digital representation of such supplemental documents for quality control; and
selecting, by the at least one data processor, randomized and stratified supplemental documents whose data is sent to the at least one human reviewer for quality control based on an algorithm designed to select documents based on their likelihood to require remediation, the selecting comprising;
stratifying the supplemental documents by weighting the corresponding documents by tier representation; and
randomizing the supplemental documents within pre-defined parameters comprising the weighting.
10 Assignments
0 Petitions
Accused Products
Abstract
Data is received that characterizes each of a plurality of documents within a document set. Based on this data, the plurality of documents are grouped into a plurality of stacks using one or more grouping algorithms. A prime document is identified for each stack that includes attributes representative of the entire stack. Subsequently, provision of data is provided that characterizes documents for each stack including at least the identified prime document to at least one human reviewer. User-generated input from the human reviewer is later received that categorized each provided document and data characterizing the user-generated input can then be provided. Related apparatus, systems, techniques and articles are also described.
-
Citations
21 Claims
-
1. A method comprising:
-
receiving, by at least one data processor, data characterizing each of a plurality of documents within a document set; grouping, by the at least one data processor, the plurality of documents into a plurality of stacks using one or more grouping algorithms; identifying, by the at least one data processor, a prime document for each stack, the prime document including attributes representative of the entire stack; providing, by the at least one data processor, data characterizing documents for each stack including at least the identified prime document to at least one human reviewer; receiving, by the at least one data processor, user-generated input from the at least one human reviewer categorizing each provided document; sending, by the at least one data processor, data characterizing supplemental documents within a stack other than the provided documents to enable the at least one human reviewer to review a digital representation of such supplemental documents for quality control; and selecting, by the at least one data processor, randomized and stratified supplemental documents whose data is sent to the at least one human reviewer for quality control based on an algorithm designed to select documents based on their likelihood to require remediation, the selecting comprising; stratifying the supplemental documents by weighting the corresponding documents by tier representation; and randomizing the supplemental documents within pre-defined parameters comprising the weighting. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 21)
-
-
15. An article of manufacture comprising:
computer executable instructions stored on non-transitory computer readable media, which, when executed by a computer, causes the computer to perform operations comprising; receiving data characterizing each of a plurality of documents within a document set; grouping the plurality of documents into a plurality of stacks using one or more grouping algorithms; identifying a prime document for each stack, the prime document including attributes representative of the entire stack; providing data characterizing documents for each stack including at least the identified prime document to at least one human reviewer; receiving user-generated input from the at least one human reviewer categorizing each provided document; sending data characterizing supplemental documents within a stack other than the provided documents to enable the at least one human reviewer to review a digital representation of such supplemental documents for quality control; defining tiers of documents within each stack, and wherein the supplemental documents comprise documents from two or more tiers; and selecting randomized and stratified supplemental documents whose data is sent to the at least one human reviewer for quality control based on an algorithm designed to select documents based on their likelihood to require remediation, the selecting comprising; stratifying the supplemental documents by weighting the corresponding documents by tier representation; and randomizing the supplemental documents within pre-defined parameters comprising the weighting. - View Dependent Claims (16, 17, 18, 19)
-
20. A system comprising:
-
at least one data processor; and memory storing instructions, which when executed by the at least one data processor, result in operations comprising; receiving data characterizing each of a plurality of documents within a document set; grouping the plurality of documents into a plurality of stacks using one or more grouping algorithms; identifying a prime document for each stack, the prime document including attributes representative of the entire stack; providing data characterizing documents for each stack including at least the identified prime document to at least one human reviewer; receiving user-generated input from the human reviewer categorizing each provided document; sending data characterizing supplemental documents within a stack other than the provided documents to enable the at least one human reviewer to review a digital representation of such supplemental documents for quality control; and selecting randomized and stratified supplemental documents whose data is sent to the at least one human reviewer for quality control based on an algorithm designed to select documents based on their likelihood to require remediation, the selecting comprising; stratifying the supplemental documents by weighting the corresponding documents by tier representation; and randomizing the supplemental documents within pre-defined parameters comprising the weighting.
-
Specification