System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith
First Claim
1. A system for computerized derivation of leads from a huge body of data, the system comprising:
- an electronic repository including a multiplicity of accesses to a respective multiplicity of electronic documents and metadata including metadata parameters having metadata values characterizing each of said multiplicity of electronic documents;
a document rater using a processor to run a first computer algorithm on said multiplicity of electronic documents which yields a score which rates each of said multiplicity of electronic documents; and
a metadata-based document discriminator using the processor to run a second computer algorithm on at least some of said metadata which yields leads, each lead comprising at least one metadata value for at least one metadata parameter, whose value correlates with the score of said electronic documents,wherein said discriminator is operative to perform the following for each of a multiplicity of logical conditions defined over the individual documents'"'"' metadata;
a. access a subset of said multiplicity of electronic documents wherein membership in said subset, of an individual document from among said multiplicity of electronic documents, is determined by whether or not a current logical condition defined over said individual documents'"'"' metadata, is true;
b. compute at least one first value, including a proportion of documents having a relevance score whose value is “
relevant”
, over at least one subset;
c. Provide a second value including a proportion of documents having a relevance score whose value is “
relevant”
from among all of said multiplicity of electronic documents; and
d. compare said first value to the second value, define said logical condition defined over said individual documents'"'"' metadata as a lead when said first and second values differ by at least a predetermined extent and generate an output indication of the lead.
2 Assignments
0 Petitions
Accused Products
Abstract
A system including an electronic repository having a multiplicity of accesses to a respective multiplicity of electronic documents and metadata; a document rater using a processor to run a first computer algorithm on the multiplicity of electronic documents which yields a score which rates each of the multiplicity of electronic documents to an issue; and a metadata-based document discriminator to run a second computer algorithm on at least some of the metadata which yields leads, each lead having at least one metadata value for at least one metadata parameter, whose value correlates with the score of the electronic documents to the issue, typically used in combination with an electronic document analysis method receiving N electronic documents pertaining to a case encompassing a set of issues including at least one issue and establishing relevance of at least the N documents to at least one individual issue in the set of issues.
-
Citations
70 Claims
-
1. A system for computerized derivation of leads from a huge body of data, the system comprising:
-
an electronic repository including a multiplicity of accesses to a respective multiplicity of electronic documents and metadata including metadata parameters having metadata values characterizing each of said multiplicity of electronic documents; a document rater using a processor to run a first computer algorithm on said multiplicity of electronic documents which yields a score which rates each of said multiplicity of electronic documents; and a metadata-based document discriminator using the processor to run a second computer algorithm on at least some of said metadata which yields leads, each lead comprising at least one metadata value for at least one metadata parameter, whose value correlates with the score of said electronic documents, wherein said discriminator is operative to perform the following for each of a multiplicity of logical conditions defined over the individual documents'"'"' metadata; a. access a subset of said multiplicity of electronic documents wherein membership in said subset, of an individual document from among said multiplicity of electronic documents, is determined by whether or not a current logical condition defined over said individual documents'"'"' metadata, is true; b. compute at least one first value, including a proportion of documents having a relevance score whose value is “
relevant”
, over at least one subset;c. Provide a second value including a proportion of documents having a relevance score whose value is “
relevant”
from among all of said multiplicity of electronic documents; andd. compare said first value to the second value, define said logical condition defined over said individual documents'"'"' metadata as a lead when said first and second values differ by at least a predetermined extent and generate an output indication of the lead. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
-
38. A method for computerized derivation of leads from a huge body of data, the method comprising:
-
providing an electronic repository including a multiplicity of accesses to a respective multiplicity of electronic documents and metadata including metadata parameters having metadata values characterizing each of said multiplicity of electronic documents; using a processor to rapidly run a first computer algorithm on said multiplicity of electronic documents which yields a score which rates each of said multiplicity of electronic documents; and using the processor to rapidly run a second computer algorithm on at least some of said metadata which yields leads, each lead comprising at least one metadata value for at least one metadata parameter, whose value correlates with the score of said electronic documents, including performing the following for each of a multiplicity of logical conditions defined over the individual documents'"'"' metadata; a. access a subset of said multiplicity of electronic documents wherein membership in said subset, of an individual document from among said multiplicity of electronic documents, is determined by whether or not a current logical condition defined over said individual documents'"'"' metadata, is true; b. compute at least one first value, including a proportion of documents having a relevance score whose value is “
relevant”
, over at least one subset;c. Provide a second value including a proportion of documents having a relevance score whose value is “
relevant”
from among all of said multiplicity of electronic documents; andd. compare said first value to the second value, define said logical condition defined over said individual documents'"'"' metadata as a lead when said first and second values differ by at least a predetermined extent and generate an output indication of the lead. - View Dependent Claims (39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68)
-
-
50. The method according to claim wherein said iteration I+1 uses a control subset larger than the control subset of iteration I, said control subset including the control subset of iteration I merged with an additional group of documents of pre-determined size randomly selected from said at least N documents.
-
69. A computer program product, comprising a tangible computer usable medium having a computer readable program code embodied therein, said computer readable program code adapted to be executed to implement a method for computerized derivation of leads from a huge body of data, the method comprising:
-
providing an electronic repository including a multiplicity of accesses to a respective multiplicity of electronic documents and metadata including metadata parameters having metadata values characterizing each of said multiplicity of electronic documents; providing a relevance rater using a processor to provide, for each individual document from among said multiplicity of electronic documents, a relevance score which rates relevance of said individual document to at least one issue; and providing a metadata-based relevant-irrelevant document discriminator using the processor to rapidly run a second computer algorithm on at least some of said metadata which yields leads, each lead comprising at least one metadata value for at least one metadata parameter, whose value correlates with relevance of said electronic documents to the issue, wherein said discriminator is operative to perform the following for each of a multiplicity of logical conditions defined over the individual documents'"'"' metadata; a. access a subset of said multiplicity of electronic documents wherein membership in said subset, of an individual document from among said multiplicity of electronic documents, is determined by whether or not a current logical condition defined over said individual documents'"'"' metadata, is true; b. compute at least one first value, including a proportion of documents having a relevance score whose value is “
relevant”
, over at least one subset;c. Provide a second value including a proportion of documents having a relevance score whose value is “
relevant”
from among all of said multiplicity of electronic documents; andd. compare said first value to the second value, define said logical condition defined over said individual documents'"'"' metadata as a lead when said first and second values differ by at least a predetermined extent and generate an output indication of the lead. - View Dependent Claims (70)
-
Specification