Document classification and characterization

US 9,703,863 B2
Filed: 03/11/2013
Issued: 07/11/2017
Est. Priority Date: 01/26/2011
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, by at least one data processor, data characterizing each of a plurality of documents within a document set;

grouping, by the at least one data processor, the plurality of documents into a plurality of stacks using one or more grouping algorithms;

identifying, by the at least one data processor, a prime document for each stack, the prime document including attributes representative of the entire stack;

providing, by the at least one data processor, data characterizing documents for each stack including at least the identified prime document to at least one human reviewer;

receiving, by the at least one data processor, user-generated input from the at least one human reviewer categorizing each provided document;

sending, by the at least one data processor, data characterizing supplemental documents within a stack other than the provided documents to enable the at least one human reviewer to review a digital representation of such supplemental documents for quality control; and

selecting, by the at least one data processor, randomized and stratified supplemental documents whose data is sent to the at least one human reviewer for quality control based on an algorithm designed to select documents based on their likelihood to require remediation, the selecting comprising;

stratifying the supplemental documents by weighting the corresponding documents by tier representation; and

randomizing the supplemental documents within pre-defined parameters comprising the weighting.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Data is received that characterizes each of a plurality of documents within a document set. Based on this data, the plurality of documents are grouped into a plurality of stacks using one or more grouping algorithms. A prime document is identified for each stack that includes attributes representative of the entire stack. Subsequently, provision of data is provided that characterizes documents for each stack including at least the identified prime document to at least one human reviewer. User-generated input from the human reviewer is later received that categorized each provided document and data characterizing the user-generated input can then be provided. Related apparatus, systems, techniques and articles are also described.

Citations

21 Claims

1. A method comprising:
- receiving, by at least one data processor, data characterizing each of a plurality of documents within a document set;
  
  grouping, by the at least one data processor, the plurality of documents into a plurality of stacks using one or more grouping algorithms;
  
  identifying, by the at least one data processor, a prime document for each stack, the prime document including attributes representative of the entire stack;
  
  providing, by the at least one data processor, data characterizing documents for each stack including at least the identified prime document to at least one human reviewer;
  
  receiving, by the at least one data processor, user-generated input from the at least one human reviewer categorizing each provided document;
  
  sending, by the at least one data processor, data characterizing supplemental documents within a stack other than the provided documents to enable the at least one human reviewer to review a digital representation of such supplemental documents for quality control; and
  
  selecting, by the at least one data processor, randomized and stratified supplemental documents whose data is sent to the at least one human reviewer for quality control based on an algorithm designed to select documents based on their likelihood to require remediation, the selecting comprising;
  
  stratifying the supplemental documents by weighting the corresponding documents by tier representation; and
  
  randomizing the supplemental documents within pre-defined parameters comprising the weighting.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 21)
- - 2. The method as in claim 1, further comprising:
    - providing, by the at least one data processor, data characterizing the user-generated input.
  - 3. The method as in claim 1, wherein categorization of each provided document by the user-generated input is propagated to all documents within the corresponding stack.
  - 4. The method as in claim 1, wherein the at least one human reviewer categorizes each provided document in a group of document review categories.
  - 5. The method as in claim 4, wherein the document review categories are selected from a group comprising:
    - relevance, responsiveness, and privilege.
  - 6. The method as in claim 1, further comprising:
    - defining, by the at least one data processor, tiers of documents within each stack, and wherein the supplemental documents comprise documents from one or more tiers.
  - 7. The method as in claim 6, wherein the tiers are based on one or more of:
    - document similarity relative to the corresponding prime document, document type, document author, document sender, and document recipient.
  - 8. The method as in claim 3, wherein documents are incrementally added to the document set after the grouping of the plurality of documents into the plurality of stacks, and wherein the method further comprises:
    - associating, by the at least one data processor, the incrementally added documents to one of the plurality of stacks; and
      
      for each stack;
      
      if the stack has already been categorized, adding, by the at least one data processor, the corresponding incrementally added documents to the stack and propagating the categorization to the incrementally added documents in such stack;
      
      orif the stack has not been categorized, adding, by the at least one data processor, the incrementally added documents to the stack.
  - 9. The method as in claim 3, wherein at least one document is incrementally added to the document set after the grouping of the plurality of documents into the plurality of stacks, and wherein the method further comprises:
    - determining, by the at least one data processor, that the at least one incrementally added document is not associated with a previously defined stack; and
      
      defining, by the at least one data processor, a new stack including the at least one incrementally added document.
  - 10. The method as in claim 3, further comprising:
    - defining, by the at least one data processor, hierarchical relationships among the plurality of documents within the set of documents; and
      
      wherein the grouping algorithms take into account the relationships between documents when grouping the plurality of documents into the plurality of stacks.
  - 11. The method as in claim 1, wherein the documents in the stacks are disjoint.
  - 12. The method as in claim 2, wherein providing the data comprises one or more of:
    - displaying the data, transmitting the data to a remote computing system, and persisting the data.
  - 13. The method as in claim 1, wherein the data characterizing documents for each stack provided to the at least one human reviewer comprise reference numbers for the documents.
  - 14. The method as in claim 1, wherein the data characterizing documents for each stack provided to the at least one human reviewer comprise digitally scanned versions of such documents.
  - 21. The method as in claim 1, wherein the data characterizing supplemental documents within the stack other than the provided documents are sent after receiving user-generated input from the human reviewer.

15. An article of manufacture comprising:
- computer executable instructions stored on non-transitory computer readable media, which, when executed by a computer, causes the computer to perform operations comprising;
  
  receiving data characterizing each of a plurality of documents within a document set;
  
  grouping the plurality of documents into a plurality of stacks using one or more grouping algorithms;
  
  identifying a prime document for each stack, the prime document including attributes representative of the entire stack;
  
  providing data characterizing documents for each stack including at least the identified prime document to at least one human reviewer;
  
  receiving user-generated input from the at least one human reviewer categorizing each provided document;
  
  sending data characterizing supplemental documents within a stack other than the provided documents to enable the at least one human reviewer to review a digital representation of such supplemental documents for quality control;
  
  defining tiers of documents within each stack, and wherein the supplemental documents comprise documents from two or more tiers; and
  
  selecting randomized and stratified supplemental documents whose data is sent to the at least one human reviewer for quality control based on an algorithm designed to select documents based on their likelihood to require remediation, the selecting comprising;
  
  stratifying the supplemental documents by weighting the corresponding documents by tier representation; and
  
  randomizing the supplemental documents within pre-defined parameters comprising the weighting.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The article as in claim 15, wherein categorization of each provided document by the user-generated input is propagated to all documents within the corresponding stack.
  - 17. The article as in claim 15, wherein the human reviewer categorizes each provided document in a group of document review categories.
  - 18. The article as in claim 17, wherein the document review categories are selected from a group comprising:
    - relevance, responsiveness, and privilege.
  - 19. The article as in claim 15, wherein the tiers are based on one or more of:
    - document similarity relative to the corresponding prime document, document type, document author, document sender, document recipient.

20. A system comprising:
- at least one data processor; and
  
  memory storing instructions, which when executed by the at least one data processor, result in operations comprising;
  
  receiving data characterizing each of a plurality of documents within a document set;
  
  grouping the plurality of documents into a plurality of stacks using one or more grouping algorithms;
  
  identifying a prime document for each stack, the prime document including attributes representative of the entire stack;
  
  providing data characterizing documents for each stack including at least the identified prime document to at least one human reviewer;
  
  receiving user-generated input from the human reviewer categorizing each provided document;
  
  sending data characterizing supplemental documents within a stack other than the provided documents to enable the at least one human reviewer to review a digital representation of such supplemental documents for quality control; and
  
  selecting randomized and stratified supplemental documents whose data is sent to the at least one human reviewer for quality control based on an algorithm designed to select documents based on their likelihood to require remediation, the selecting comprising;
  
  stratifying the supplemental documents by weighting the corresponding documents by tier representation; and
  
  randomizing the supplemental documents within pre-defined parameters comprising the weighting.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Consilio LLC
Original Assignee
DiscoverReady LLC (Consilio LLC)
Inventors
Barsony, Stephen John, Wagner, Jr., James Kenneth, Messing, Yerachmiel Tzvi, Shub, David Matthew
Primary Examiner(s)
PYO, MONICA M

Application Number

US13/794,446
Publication Number

US 20130246426A1
Time in Patent Office

1,583 Days
Field of Search

707737, 707738, 707739, 707740, 707752, 707754, 704 9
US Class Current
CPC Class Codes

G06F 16/355   Class or cluster creation o...

G06F 16/358   Browsing; Visualisation the...

G06Q 10/00   Administration; Management

Document classification and characterization

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Document classification and characterization

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links