Search term hit counts in an electronic discovery system
First Claim
1. A computer-implemented method for determining search term hit counts in an electronic discovery system, the method comprising:
- identifying an electronic data set comprising data items for collection by an electronic discovery system;
determining an estimated size of memory required to collect the data items of the electronic data set;
determining, via a computer device processor, whether or not the estimated size of memory required to collect data items of the electronic data set is below a predetermined threshold;
collecting, via a computing device processor, the data items in response to determining that the estimated size of memory required to collect the data items of the electronic data set is below the predetermined threshold, thus resulting in a collected data set;
receiving, at a computing device, inputs that provide for a search term set that includes a plurality of search terms, wherein the search term set is associated with a case in the electronic discovery system and a search term is defined as a word or phrase associated with the case for identifying data items in the collected data set;
prior to finalizing the search term set that will be applied to all of the collected data associated with the case, determining, via a computing device processor, a plurality of search term hit counts by applying the search term set to a portion of the collected data set,wherein the search term hit counts are defined as a number of data items in the portion of the collected data set in which (1) a specific search term included in the search term set occurs or (2) any one of the search terms in the search term set occur, and wherein the search term hit counts include;
a per-data type search term hit count for one or more data types in the collected data set, wherein the one or more data types include electronic mail data and electronic file data, anda per-custodian search term hit count for each custodian associated with the case, wherein determining the per-data type search term hit count for one or more data types in the collected data set further comprises determining for each of the one or more data types in the collected data set a number of occurrences of the search term in each of the one or more data types, and wherein the per-custodian search term hit count is defined as a number of data items in the portion of the collected data set in which (1) the specific search term included in the search term set occurs or (2) any one of the search terms in the search term set occur and the data items in the search term hit count are also associated with a corresponding custodian;
predicting, via a computing device processor, for an entirety of the collected data set based on results of applying the search term set to the portion of the collected data set, a volume of the collected data set required to be reviewed;
determining, via a computing device processor, for each of the plurality of search terms, a file size, each file size corresponding to an amount of storage space occupied by each of the data items that comprise a corresponding search term; and
storing, in computing device memory, the plurality of search term hit counts and the associated file size of the data items, wherein storing includes storing the per-custodian search term hit counts in a corresponding custodian profile within a custodian database and storing all of the search term hit counts in an associated search term file within the electronic discovery system.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the invention relate to systems, methods, and computer program products for improved electronic discovery. More specifically, embodiments relate to managing the process for creating search term sets to be applied to electronic data sets associated with a case in an electronic discovery system. A search term management application is provided that allows for multiple users to work collaboratively to define the final search term set that is subsequently applied to the corpus of electronic data for the case. In addition, the application provides for tracking the overall search term creation process. In addition, embodiments provide for a search term hit count engine that is configured to determine search term hit counts for data as a means of predicting the volume of data needed to review.
66 Citations
12 Claims
-
1. A computer-implemented method for determining search term hit counts in an electronic discovery system, the method comprising:
-
identifying an electronic data set comprising data items for collection by an electronic discovery system; determining an estimated size of memory required to collect the data items of the electronic data set; determining, via a computer device processor, whether or not the estimated size of memory required to collect data items of the electronic data set is below a predetermined threshold; collecting, via a computing device processor, the data items in response to determining that the estimated size of memory required to collect the data items of the electronic data set is below the predetermined threshold, thus resulting in a collected data set; receiving, at a computing device, inputs that provide for a search term set that includes a plurality of search terms, wherein the search term set is associated with a case in the electronic discovery system and a search term is defined as a word or phrase associated with the case for identifying data items in the collected data set; prior to finalizing the search term set that will be applied to all of the collected data associated with the case, determining, via a computing device processor, a plurality of search term hit counts by applying the search term set to a portion of the collected data set, wherein the search term hit counts are defined as a number of data items in the portion of the collected data set in which (1) a specific search term included in the search term set occurs or (2) any one of the search terms in the search term set occur, and wherein the search term hit counts include; a per-data type search term hit count for one or more data types in the collected data set, wherein the one or more data types include electronic mail data and electronic file data, and a per-custodian search term hit count for each custodian associated with the case, wherein determining the per-data type search term hit count for one or more data types in the collected data set further comprises determining for each of the one or more data types in the collected data set a number of occurrences of the search term in each of the one or more data types, and wherein the per-custodian search term hit count is defined as a number of data items in the portion of the collected data set in which (1) the specific search term included in the search term set occurs or (2) any one of the search terms in the search term set occur and the data items in the search term hit count are also associated with a corresponding custodian; predicting, via a computing device processor, for an entirety of the collected data set based on results of applying the search term set to the portion of the collected data set, a volume of the collected data set required to be reviewed; determining, via a computing device processor, for each of the plurality of search terms, a file size, each file size corresponding to an amount of storage space occupied by each of the data items that comprise a corresponding search term; and storing, in computing device memory, the plurality of search term hit counts and the associated file size of the data items, wherein storing includes storing the per-custodian search term hit counts in a corresponding custodian profile within a custodian database and storing all of the search term hit counts in an associated search term file within the electronic discovery system. - View Dependent Claims (2, 3, 4)
-
-
5. An apparatus for determining search term hit counts in an electronic discovery system, the apparatus comprising:
-
a computing platform including at least one processor and a memory; and a search term hit count engine stored in the memory, executable by the at least one processor and configured to cause the at least one processor to; identify an electronic data set comprising data items for collection by an electronic discovery system; determine an estimated size of memory required to collect the data items of the electronic data set; determine whether or not the estimated size of memory required to collect the data items of the electronic data set is below a predetermined threshold; collect the data items in response to determining that the estimated size of memory required to collect the data items of the electronic data set is below the predetermined threshold, thus resulting in a collected data set; receive a search term set that includes a plurality of search terms associated with a case in the electronic discovery system, and wherein a search term is defined as a word or phrase associated with the case for identifying data items in the collected data set; prior to finalizing the search term set that will be applied to all of the electronic data associated with the case, determine a plurality of search term hit counts by applying the search term set to a portion of the collected data set, wherein the search term hit counts are defined as a number of data items in the portion of the collected data set in which (1) a specific search term included in the search term set occurs or (2) any one of the search terms in the search term set occur, and wherein the search term hit counts include; a per-data type search term hit count for one or more data types in the collected data set, wherein the one or more data types include electronic mail data and electronic file data, and a per-custodian search term hit count for each custodian associated with the case, wherein determining the per-data type search term hit count for one or more data types in the collected data set further comprises determining for each of the one or more data types in the collected data set a number of occurrences of the search term in each of the one or more data types, and wherein the per-custodian search term hit count is defined as a number of data items in the portion of the collected data set in which (1) the specific search term included in the search term set occurs or (2) any one of the search terms in the search term set occur and the data items in the search term hit count are also associated with a corresponding custodian; predict, for an entirety of the collected data set based on results of applying the search term set to the portion of the collected data set, the volume of the collected data set required to be reviewed; determine for each of the plurality of search terms, a file size, each file size corresponding to an amount of storage space occupied by each of the data items that comprise a corresponding search term; and store, in computing device memory, the plurality of search term hit counts and the associated file size, wherein storing includes storing the per-custodian search term hit counts in a corresponding custodian profile within a custodian database and storing all of the search term hit counts in an associated search term file within the electronic discovery system. - View Dependent Claims (6, 7, 8)
-
-
9. A computer program product comprising:
-
a non-transitory computer-readable medium comprising; a first set of codes for causing a computer to identify an electronic data set comprising data items for collecting by an electronic discovery system; a second set of codes for causing a computer to determine an estimated size of memory required to collect data items of the electronic data set; a third set of codes for causing a computer to determine whether or not the estimated size of memory required to collect data items of the electronic data set is below a predetermined threshold; a fourth set of codes for causing a computer to collect the data items in response to determining that the estimated size of memory required to collect the data items of the electronic data set is below the predetermined threshold, thus resulting in a collected data set; a fifth set of codes for causing a computer to receive inputs that provide for a search term set that includes a plurality of search terms, wherein the search term set is associated with a case in the electronic discovery system and a search term is defined as a word or phrase associated with the case for identifying data items in the collected data set; a sixth set of codes for causing a computer to, prior to finalizing the search term set that will be applied to all of the electronic data associated with the case, determine a plurality of search term hit counts by applying the search term set to a portion of the collected data set, wherein the search term hit counts are defined as a number of data items in the portion of the collected data set in which (1) a specific search term included in the search term set occurs or (2) any one of the search terms in the search term set occur, and wherein the search term hit counts include; a per-data type search term hit count for one or more data types in the collected data set, wherein the one or more data types include electronic mail data and electronic file data, and a per-custodian search term hit count for each custodian associated with the case, wherein determining the per-data type search term hit count for one or more data types in the collected data set further comprises determining for each of the one or more data types in the collected data set a number of occurrences of the search term in each of the one or more data types, and wherein the per-custodian search term hit count is defined as a number of data items in the portion of the collected data set in which (1) the specific search term included in the search term set occurs or (2) any one of the search terms in the search term set occur and the data items in the search term hit count are also associated with a corresponding custodian; a seventh set of codes for causing a computer to predict, for an entirety of the collected data set based on results of applying the search term set to the portion of the collected data set, the volume of the collected data set required to be reviewed; an eighth set of codes for causing a computer to determine, for each of the plurality of search terms, a file size, each file size corresponding to an amount of storage space occupied by each of the data items that comprise a corresponding search term; and a ninth set of codes for causing a computer to store the plurality of search term hit counts and the associated file size, wherein storing includes storing the per-custodian search term hit counts in a corresponding custodian profile within a custodian database and storing all of the search term hit counts in an associated search term file within the electronic discovery system. - View Dependent Claims (10, 11, 12)
-
Specification