System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith

US 9,881,080 B2
Filed: 07/15/2016
Issued: 01/30/2018
Est. Priority Date: 04/22/2009
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

one or more processors; and

memory that stores instructions that are executable by the one or more processors to cause the system to perform operations comprising;

receiving a first output of a categorization process applied to a training subset of documents of a plurality of documents, the first output including a first indication and a second indication for each document in the training set of documents, the first indication indicating a relevance between a document in the training set of documents and an issue in a set of issues and the second indication indicating a lack of relevance between the document and the issue;

generating a classifier based at least in part on the first output;

executing the classifier on the plurality of documents to determine a second output, the second output indicating an extent of relevance of each document in the plurality of documents to the issue;

partitioning individual documents in the plurality of documents into subsets of documents based at least in part on the second output;

adding additional documents from at least one subset of the subsets of documents into the training subset of documents to generate a control subset of documents;

executing, as part of a first iteration, the classifier on the control subset of documents to determine a third output;

determining, based at least in part on the third output, a threshold associated with the classifier, the threshold being associated with a cutoff point for binarizing a ranking of individual documents in the control subset of documents;

computing, based at least in part on the cutoff point, a quality criterion associated with the classifier;

determining, based at least in part on the quality criterion, a first quality of performance of the classifier as applied to the control subset of documents;

receiving an input;

determining, based at least in part on the input, to initiate a second iteration;

determining a second quality of performance of the classifier for the second iteration; and

displaying a comparison of the first quality of performance and the second quality of performance.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An electronic document analysis method receiving N electronic documents pertaining to a case encompassing a set of issues including at least one issue and establishing relevance of at least the N documents to at least one individual issue in the set of issues, the method comprising, for at least one individual issue from among the set of issues, receiving an output of a categorization process applied to each document in training and control subsets of the at least N documents, the output including, for each document in the subsets, one of a relevant-to-the-individual issue indication and a non-relevant-to-the-individual issue indication; building a text classifier simulating the categorization process using the output for all documents in the training subset of documents; and running the text classifier on the at least N documents thereby to obtain a ranking of the extent of relevance of each of the at least N documents to the individual issue. The method may also comprise evaluating the text classifier'"'"'s quality using the output for all documents in the control subset.

Citations

15 Claims

1. A system comprising:
- one or more processors; and
  
  memory that stores instructions that are executable by the one or more processors to cause the system to perform operations comprising;
  
  receiving a first output of a categorization process applied to a training subset of documents of a plurality of documents, the first output including a first indication and a second indication for each document in the training set of documents, the first indication indicating a relevance between a document in the training set of documents and an issue in a set of issues and the second indication indicating a lack of relevance between the document and the issue;
  
  generating a classifier based at least in part on the first output;
  
  executing the classifier on the plurality of documents to determine a second output, the second output indicating an extent of relevance of each document in the plurality of documents to the issue;
  
  partitioning individual documents in the plurality of documents into subsets of documents based at least in part on the second output;
  
  adding additional documents from at least one subset of the subsets of documents into the training subset of documents to generate a control subset of documents;
  
  executing, as part of a first iteration, the classifier on the control subset of documents to determine a third output;
  
  determining, based at least in part on the third output, a threshold associated with the classifier, the threshold being associated with a cutoff point for binarizing a ranking of individual documents in the control subset of documents;
  
  computing, based at least in part on the cutoff point, a quality criterion associated with the classifier;
  
  determining, based at least in part on the quality criterion, a first quality of performance of the classifier as applied to the control subset of documents;
  
  receiving an input;
  
  determining, based at least in part on the input, to initiate a second iteration;
  
  determining a second quality of performance of the classifier for the second iteration; and
  
  displaying a comparison of the first quality of performance and the second quality of performance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The system as claim 1 recites, wherein the input comprises a user input received from a user.
  - 3. The system as claim 1 recites, wherein the input comprises a computerized input.
  - 4. The system as claim 1 recites, the operations further comprising:
    - generating at least one graph of respective quality criterions determined for the first iteration and the second iteration in view of respective iteration serial numbers for the first iteration and the second iteration; and
      
      displaying the at least one graph to represent the comparison.
  - 5. The system as claim 1 recites, the operations further comprising generating a computer display representative of the first output of the categorization process.
  - 6. The system as claim 5 recites, wherein the computer display includes a histogram of ranks for each issue in the set of issues.
  - 7. The system as claim 5 recites, wherein:
    - the computer display comprises a function of an indication of quality measures for at least the cutoff point associated with the first iteration and one or more other cutoff points associated with additional iterations.
  - 8. The system as claim 7 recites, the operations further comprising:
    - determining at least one of an un-weighted F-measure, a weighted F-measure, a precision, a recall, or an accuracy associated with the first iteration; and
      
      determining a quality measure of the quality measures associated with the cutoff point based on one or more of the un-weighted F-measure, the weighted F-measure, the precision, the recall, or the accuracy associated with the first iteration.

9. A method comprising:
- receiving a first output of a categorization process applied to a training subset of documents of a plurality of documents, the first output including a first indication and a second indication for each document in the training set of documents, the first indication indicating a relevance between a document in the training set of documents and an issue in a set of issues and the second indication indicating a lack of relevance between the document and the issue;
  
  generating a classifier based at least in part on the first output;
  
  executing the classifier on the plurality of documents to determine a second output indicating an extent of relevance of each document in the plurality of documents to the issue;
  
  partitioning, based at least in part on the second output, individual documents in the plurality of documents into subsets of documents;
  
  adding additional documents from at least one subset of the subsets of documents into the training subset of documents to generate a control subset of documents;
  
  executing, as part of a first iteration, the classifier on the control subset of documents to determine a third output;
  
  determining, based at least in part on the third output, a threshold associated with the classifier, the threshold being associated with a cutoff point for binarizing a ranking of individual documents in the control subset of documents;
  
  computing, based at least in part on the cutoff point, a quality criterion associated with the classifier;
  
  determining, based at least in part on the quality criterion, a first quality of performance of the classifier as applied to the control subset of documents;
  
  receiving an input;
  
  determining, based at least in part on the input, to initiate a second iteration;
  
  determining a second quality of performance of the classifier for the second iteration; and
  
  displaying a comparison of the first quality of performance and the second quality of performance.

10. A computer storage device storing instructions that, when executed by one or more processors, cause a device to perform operations comprising:
- receiving a first output of a categorization process applied to a training subset of documents of a plurality of documents, the first output including a first indication and a second indication for each document in the training set of documents, the first indication indicating a relevance between a document in the training set of documents and an issue in a set of issues and the second indication indicating a lack of relevance between the document and the issue;
  
  generating a classifier based at least in part on the first output;
  
  executing the classifier on the plurality of documents to determine a second output, the second output indicating an extent of relevance of each document in the plurality of documents to the issue;
  
  partitioning individual documents in the plurality of documents into subsets of documents based at least in part on the second output;
  
  adding additional documents from at least one subset of the subsets of documents into the training subset of documents to generate a control subset of documents;
  
  executing, as part of a first iteration, the classifier on the control subset of documents to determine a third output;
  
  determining, based at least in part on the third output, a threshold associated with the classifier, the threshold being associated with a cutoff point for binarizing a ranking of individual documents in the control subset of documents;
  
  computing, based at least in part on the cutoff point, a quality criterion associated with the classifier;
  
  determining, based at least in part on the quality criterion, a first quality of performance of the classifier as applied to the control subset of documents;
  
  receiving an input;
  
  determining, based at least in part on the input, to initiate a second iteration;
  
  determining a second quality of performance of the classifier for the second iteration; and
  
  displaying a comparison of the first quality of performance and the second quality of performance.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The computer storage device as claim 10 recites, wherein the input comprises a user input received from a user.
  - 12. The computer storage device as claim 10 recites, wherein the input comprises a computerized input.
  - 13. The computer storage device as claim 10 recites, the operations further comprising:
    - generating at least one graph of respective quality criterions determined for the first iteration and the second iteration in view of respective iteration serial numbers for the first iteration and the second iteration; and
      
      displaying the at least one graph to represent the comparison.
  - 14. The computer storage device as claim 10 recites, the operations further comprising generating a computer display representative of the first output of the categorization process.
  - 15. The computer storage device as claim 14 recites, wherein:
    - the computer display comprises a function of an indication of quality measures for at least the cutoff point associated with the first iteration and one or more other cutoff points associated with additional iterations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Israel Research And Development (2002) Ltd (Microsoft Corporation)
Original Assignee
Microsoft Israel Research And Development (2002) Ltd (Microsoft Corporation)
Inventors
Ravid, Yiftach
Primary Examiner(s)
ALAM, SHAHID AL

Application Number

US15/212,092
Publication Number

US 20170011118A1
Time in Patent Office

564 Days
Field of Search

707740, 707749
US Class Current
CPC Class Codes

G06F 16/3326   using relevance feedback fr...

G06F 16/35   Clustering; Classification

G06F 16/353   into predefined classes

G06F 16/93   Document management systems

G06F 16/951   Indexing; Web crawling tech...

G06N 20/00   Machine learning

System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links