SYSTEM FOR ENHANCING EXPERT-BASED COMPUTERIZED ANALYSIS OF A SET OF DIGITAL DOCUMENTS AND METHODS USEFUL IN CONJUNCTION THEREWITH
First Claim
1. An electronic document analysis method receiving N electronic documents pertaining to a case encompassing a set of issues including at least one issue and establishing relevance of at least the N electronic documents to at least one individual issue in the set of issues, the method comprising, for at least one individual issue from among said set of issues:
- i. receiving an output of a categorization process applied to documents in at least control subsets of said at least N electronic documents, said output including, for each document in said subsets, one of a relevant-to-said-individual issue indication and a non-relevant-to-said-individual issue indication;
ii. seeking an input as to whether or not to initiate a new iteration I. If not, terminate;
if so continue to step iii;
iii. selecting m documents from among a subset of the N documents that are not in the control set and that were not used in previous rounds for training the classifier;
iv. receiving an output of a categorization process applied to the m documents;
v. adding the m documents to an existing training subset and building a text classifier simulating said categorization process using said output for all documents in said training subset of documents;
vi. evaluating said text classifier'"'"'s quality using said output for documents in said control subset;
vii. selecting a cut-off point for binarizing said rankings of said documents in said control subset;
viii. using said cut-off point, computing and storing at least one quality criterion characterizing said binarizing of said rankings of said documents in said control subset, thereby to define a quality of performance indication of a current iteration I;
ix. displaying a comparison of the quality of performance indication of the current iteration I to quality of performance indications of previous iterations; and
x. returning to step ii.
2 Assignments
0 Petitions
Accused Products
Abstract
An electronic document analysis method receiving N electronic documents pertaining to a case encompassing a set of issues including at least one issue and establishing relevance of at least the N documents to at least one individual issue in the set of issues, the method comprising, for at least one individual issue from among the set of issues, receiving an output of a categorization process applied to each document in training and control subsets of the at least N documents, the output including, for each document in the subsets, one of a relevant-to-the-individual issue indication and a non-relevant-to-the-individual issue indication; building a text classifier simulating the categorization process using the output for all documents in the training subset of documents; and running the text classifier on the at least N documents thereby to obtain a ranking of the extent of relevance of each of the at least N documents to the individual issue. The method may also comprise evaluating the text classifier'"'"'s quality using the output for all documents in the control subset.
-
Citations
39 Claims
-
1. An electronic document analysis method receiving N electronic documents pertaining to a case encompassing a set of issues including at least one issue and establishing relevance of at least the N electronic documents to at least one individual issue in the set of issues, the method comprising, for at least one individual issue from among said set of issues:
-
i. receiving an output of a categorization process applied to documents in at least control subsets of said at least N electronic documents, said output including, for each document in said subsets, one of a relevant-to-said-individual issue indication and a non-relevant-to-said-individual issue indication; ii. seeking an input as to whether or not to initiate a new iteration I. If not, terminate;
if so continue to step iii;iii. selecting m documents from among a subset of the N documents that are not in the control set and that were not used in previous rounds for training the classifier; iv. receiving an output of a categorization process applied to the m documents; v. adding the m documents to an existing training subset and building a text classifier simulating said categorization process using said output for all documents in said training subset of documents; vi. evaluating said text classifier'"'"'s quality using said output for documents in said control subset; vii. selecting a cut-off point for binarizing said rankings of said documents in said control subset; viii. using said cut-off point, computing and storing at least one quality criterion characterizing said binarizing of said rankings of said documents in said control subset, thereby to define a quality of performance indication of a current iteration I; ix. displaying a comparison of the quality of performance indication of the current iteration I to quality of performance indications of previous iterations; and x. returning to step ii. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 39)
-
-
38. An electronic document analysis system operative for receiving N electronic documents pertaining to a case encompassing a set of issues including at least one issue and establishing relevance of at least the N electronic documents to at least one individual issue in the set of issues, the system comprising:
-
a processor operative, for at least one individual issue from among said set of issues, for; i. receiving an output of a categorization process applied to documents in at least control subsets of said at least N electronic documents, said output including, for each document in said subsets, one of a relevant-to-said-individual issue indication and a non-relevant-to-said-individual issue indication; ii. seeking an input as to whether or not to initiate a new iteration I. If not, terminate;
if so continue to step iii;iii. selecting m documents from among a subset of the N documents that are not in the control set and that were not used in previous rounds for training the classifier; iv. receiving an output of a categorization process applied to the m documents; v. adding the m documents to an existing training subset and building a text classifier simulating said categorization process using said output for all documents in said training subset of documents; vi. evaluating said text classifier'"'"'s quality using said output for documents in said control subset; vii. selecting a cut-off point for binarizing said rankings of said documents in said control subset; viii. using said cut-off point, computing and storing at least one quality criterion characterizing said binarizing of said rankings of said documents in said control subset, thereby to define a quality of performance indication of a current iteration I; ix. displaying a comparison of the quality of performance indication of the current iteration I to quality of performance indications of previous iterations; and x. returning to step ii.
-
Specification