Systems and methods for predictive coding

US 8,489,538 B1
Filed: 09/21/2012
Issued: 07/16/2013
Est. Priority Date: 05/25/2010
Status: Active Grant

First Claim

Patent Images

1. A method for analyzing a plurality of documents, comprising:

receiving the plurality of documents via a computing device;

filtering the plurality of documents to produce a subset of the plurality of documents;

executing instructions stored in memory, wherein execution of the instructions by a processor generates an initial control set based on random sampling of the subset of the plurality of documents;

receiving user input from the computing device, the user input based on an identified subject or category; and

executing instructions stored in memory, wherein execution of the instructions by a processor;

reviews the initial control set to determine at least one seed set parameter associated with the identified subject or category,automatically codes a first portion of the plurality of documents, based on the initial control set and the at least one seed set parameter associated with the identified subject or category,automatically codes a second portion of the plurality of documents resulting from an application of user analysis and an adaptive identification cycle, andadds the coded second portion of the plurality of documents to the initial control set.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for analyzing documents are provided herein. A plurality of documents and user input are received via a computing device. The user input includes hard coding of a subset of the plurality of documents, based on an identified subject or category. Instructions stored in memory are executed by a processor to generate an initial control set, analyze the initial control set to determine at least one seed set parameter, automatically code a first portion of the plurality of documents based on the initial control set and the seed set parameter associated with the identified subject or category, analyze the first portion of the plurality of documents by applying an adaptive identification cycle, and retrieve a second portion of the plurality of documents based on a result of the application of the adaptive identification cycle test on the first portion of the plurality of documents.

42 Citations

View as Search Results

28 Claims

1. A method for analyzing a plurality of documents, comprising:
- receiving the plurality of documents via a computing device;
  
  filtering the plurality of documents to produce a subset of the plurality of documents;
  
  executing instructions stored in memory, wherein execution of the instructions by a processor generates an initial control set based on random sampling of the subset of the plurality of documents;
  
  receiving user input from the computing device, the user input based on an identified subject or category; and
  
  executing instructions stored in memory, wherein execution of the instructions by a processor;
  
  reviews the initial control set to determine at least one seed set parameter associated with the identified subject or category,automatically codes a first portion of the plurality of documents, based on the initial control set and the at least one seed set parameter associated with the identified subject or category,automatically codes a second portion of the plurality of documents resulting from an application of user analysis and an adaptive identification cycle, andadds the coded second portion of the plurality of documents to the initial control set.
- View Dependent Claims (2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The method of claim 1, wherein receiving user input on the initial control set includes a validation of the initial control set.
  - 3. The method of claim 1, wherein the random sampling is on a static basis, and wherein generating the initial control set is further based on random sampling of the subset of the plurality of documents on a rolling load basis.
  - 4. The method of claim 1, wherein filtering further comprises culling the plurality of documents based on metadata.
  - 5. The method of claim 1, further comprising receiving user input from the computing device, the user input comprising a designation corresponding to key documents of the initial control set.
  - 7. The method of claim 1, further comprising executing instructions stored in memory, wherein execution of the instructions by a processor adds further documents to the plurality of documents on a rolling load basis.
  - 8. The method of claim 1, further comprising transmitting to a display the first portion of the plurality of documents, the display coupled to the computing device.
  - 9. The method of claim 1, further comprising:
    - receiving user input via the computing device, the user input corresponding to a confidence level; and
      
      executing instructions stored in memory, wherein execution of the instructions by the processor;
      
      calculates a statistic regarding machine-only accuracy rate, andcompares a statistic regarding machine coding accuracy rate against user input based on a defined confidence interval.
  - 10. The method of claim 1, wherein the step of automatically coding a first portion of the plurality of documents further comprises executing instructions stored in memory, wherein execution of the instructions by the processor automatically codes based on probabilistic latent semantic analysis and support vector machine analysis of the first portion of the plurality of documents.
  - 11. The method of claim 1, further comprising applying targeted document identification on the plurality of documents.
  - 12. The method of claim 11, wherein applying further comprises identifying a candidate key document based on one or more search terms.
  - 13. The method of claim 11, wherein applying further comprises identifying a candidate key document based on sampling based on one or more phrases.
  - 14. The method of claim 11, wherein applying further comprises identifying a candidate key document using a location filter.
  - 15. The method of claim 11, wherein applying further comprises identifying a candidate key document based on sampling and extrapolation.
  - 16. The method of claim 1, further comprising applying confidence threshold validation testing on one or more of the plurality of documents.
  - 17. The method of claim 16, wherein applying confidence threshold validation testing comprises:
    - setting a size of a quality control (QC) sample set at the size of the initial control set,creating the QC sample set by random sampling from unreviewed documents, andreviewing the QC sample set.
  - 18. The method of claim 1, further comprising adding documents on a rolling load basis for automatic coding based on received user input regarding the initial control set documents.

6. A method for analyzing a plurality of documents, further comprising:
- receiving the plurality of documents via a computing device;
  
  filtering the plurality of documents to produce a subset of the plurality of documents;
  
  executing instructions stored in memory, wherein execution of the instructions by a processor generates an initial control set based on random sampling of the subset of the plurality of documents;
  
  receiving user input from the computing device, the user input based on an identified subject or category; and
  
  executing instructions stored in memory, wherein execution of the instructions by a processor;
  
  reviews the initial control set to determine at least one seed set parameter associated with the identified subject or category,automatically codes a first portion of the plurality of documents, based on the initial control set and the at least one seed set parameter associated with the identified subject or category,analyzes the first portion of the plurality of documents by applying an adaptive identification cycle, the adaptive identification cycle being based on the initial control set, user validation of the automated coding of the first portion of the plurality of documents and confidence threshold validation, andretrieves a second portion of the plurality of documents based on a result of the application of the adaptive identification cycle on the first portion of the plurality of documents.

19. A method for analyzing a plurality of documents, comprising:
- receiving the plurality of documents via a computing device;
  
  filtering the plurality of documents to produce a subset of the plurality of documents;
  
  executing instructions stored in memory, wherein execution of the instructions by a processor generates an initial control set based on random sampling of the subset of the plurality of documents on a rolling load basis;
  
  receiving user input from the computing device, the user input based on an identified subject or category; and
  
  executing instructions stored in memory, wherein execution of the instructions by a processor;
  
  reviews the initial control set to determine at least one seed set parameter associated with the identified subject or category,automatically codes a first portion of the plurality of documents, based on the initial control set and the at least one seed set parameter associated with the identified subject or category,automatically codes a second portion of the plurality of documents resulting from an application of user analysis and an adaptive identification cycle, andadds the coded second portion of the plurality of documents to the initial control set.
- View Dependent Claims (20, 21, 22, 24, 25, 26, 27, 28)
- - 20. The method of claim 19, wherein receiving user input on the initial control set includes a validation of the initial control set.
  - 21. The method of claim 19, wherein filtering further comprises culling the plurality of documents based on metadata.
  - 22. The method of claim 19, further comprising receiving user input from the computing device, the user input comprising a designation corresponding to key documents of the initial control set.
  - 24. The method of claim 19, further comprising:
    - receiving user input via the computing device, the user input corresponding to a confidence level; and
      
      executing instructions stored in memory, wherein execution of the instructions by the processor;
      
      calculates a statistic regarding machine-only accuracy rate, andcompares a statistic regarding machine coding accuracy rate against user input based on a defined confidence interval.
  - 25. The method of claim 19, wherein the step of automatically coding a first portion of the plurality of documents further comprises executing instructions stored in memory, wherein execution of the instructions by the processor automatically codes based on probabilistic latent semantic analysis and support vector machine analysis of the first portion of the plurality of documents.
  - 26. The method of claim 19, further comprising applying targeted document identification on the plurality of documents.
  - 27. The method of claim 19, further comprising:
    - adding additional documents to the plurality of documents on a rolling load basis;
      
      wherein generating the initial control set based on random sampling of the subset of the plurality of documents on a rolling load basis includes generating the initial control set based on the random sampling of the additional documents added on a rolling load basis.
  - 28. The method of claim 19, further comprising:
    - adding additional documents to the plurality of documents on a rolling load basis;
      
      wherein generating the initial control set is further based on random sampling of the subset of the plurality of documents on a static basis; and
      
      wherein generating the initial control set includes supplementing the random sampling on a static basis based on random sampling of the additional documents added on a rolling load basis.

23. A method for analyzing a plurality of documents, comprising:
- receiving the plurality of documents via a computing device;
  
  filtering the plurality of documents to produce a subset of the plurality of documents;
  
  executing instructions stored in memory, wherein execution of the instructions by a processor generates an initial control set based on random sampling of the subset of the plurality of documents on a rolling load basis;
  
  receiving user input from the computing device, the user input based on an identified subject or category; and
  
  executing instructions stored in memory, wherein execution of the instructions by a processor;
  
  reviews the initial control set to determine at least one seed set parameter associated with the identified subject or category,automatically codes a first portion of the plurality of documents, based on the initial control set and the at least one seed set parameter associated with the identified subject or category,analyzes the first portion of the plurality of documents by applying an adaptive identification cycle, the adaptive identification cycle being based on the initial control set, user validation of the automated coding of the first portion of the plurality of documents and confidence threshold validation, andretrieves a second portion of the plurality of documents based on a result of the application of the adaptive identification cycle on the first portion of the plurality of documents.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Open Text Holdings, Inc. (Open Text Corporation)
Original Assignee
Recommind Incorporated (Open Text Corporation)
Inventors
Puzicha, Jan, Vranas, Steve
Primary Examiner(s)
Chaki, Kakali
Assistant Examiner(s)
SITIRICHE, LUIS A

Application Number

US13/624,854
Time in Patent Office

298 Days
Field of Search

None
US Class Current

706/52
CPC Class Codes

G06F 16/93   Document management systems

G06N 20/00   Machine learning

G06N 20/10   using kernel methods, e.g. ...

G06N 5/04   Inference or reasoning models

G06N 5/048   Fuzzy inferencing

G06N 7/01   Probabilistic graphical mod...

Systems and methods for predictive coding

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

42 Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for predictive coding

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links