System and method for indexing electronic discovery data
First Claim
Patent Images
1. A method, comprising the steps of:
- (a) determining a file type using a file typer for each document in a plurality of documents;
(b) extracting data using a file ripper from each document typed in step (a), wherein the data is selected from the group consisting of metadata, text, records, tables, images, pictures, auditory data or combinations thereof;
(c) testing the data extracted in step (b) for an embedded object, and if one or more embedded objects are detected, appending the data from the embedded objects to a buffer wherein the data in the documents;
(d) repeating steps (a) to (c) recursively for each document until no additional embedded objects are detected in the documents;
(e) repeating steps (a) to (d) iteratively for each document in the plurality of documents;
(f) creating an object map which preserves respective locations of the embedded objects within each document in the plurality of documents;
(g) replacing the embedded objects within each document of the documents with text, where the spatial relationship between the text at the location and the surrounding text is specified by the object map; and
(h) generating an index which preserves hierarchial relationships among the documents and the embedded objects, where the documents and the embedded objects have at least one individual identifier associated with each of the documents and the embedded objects.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for efficiently processing electronically stored information (ESI) are described. The systems and methods describe processing ESI in preparation for, or association with, litigation. The invention preserves the contextual relationships among documents when processing and indexing data, allowing for increased precision and recall during data analytics.
259 Citations
23 Claims
-
1. A method, comprising the steps of:
-
(a) determining a file type using a file typer for each document in a plurality of documents; (b) extracting data using a file ripper from each document typed in step (a), wherein the data is selected from the group consisting of metadata, text, records, tables, images, pictures, auditory data or combinations thereof; (c) testing the data extracted in step (b) for an embedded object, and if one or more embedded objects are detected, appending the data from the embedded objects to a buffer wherein the data in the documents; (d) repeating steps (a) to (c) recursively for each document until no additional embedded objects are detected in the documents; (e) repeating steps (a) to (d) iteratively for each document in the plurality of documents; (f) creating an object map which preserves respective locations of the embedded objects within each document in the plurality of documents; (g) replacing the embedded objects within each document of the documents with text, where the spatial relationship between the text at the location and the surrounding text is specified by the object map; and (h) generating an index which preserves hierarchial relationships among the documents and the embedded objects, where the documents and the embedded objects have at least one individual identifier associated with each of the documents and the embedded objects. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer system for reviewing data, the computer system comprising:
-
(a) a source of a plurality of electronic documents, the source provided by a non-transitory computer-readable storage medium; (b) a software sub-routine comprising a file typer, the software sub-routine embodied as software instructions, which when executed by a processor, cause the processor to determine the physical file type of a document; (c) a software sub-routine comprising a file ripper, the software sub-routine embodied as software instructions, which when executed by a processor, cause the processor to extract data from at least one document from the plurality of electronic documents, wherein the data is selected from the group consisting of metadata, text, records, tables, images, pictures, auditory data or combinations thereof; (i) wherein the file ripper tests each document for linked or embedded objects, and (ii) wherein the file ripper recursively repeats step (i) if additional linked or embedded objects are detected; (d) an index comprising data from the documents and embedded objects wherein the index preserves hierarchical relationships among the embedded objects and wherein each embedded object has at least one individual identifier showing the respective locations of the embedded objects within the documents preserved through the use of an object map, and (e) a software sub-routine embodied as software instructions, which when executed by the processor, cause the processor to replace the embedded objects within the documents with text corresponding to the embedded objects the text inserted at the respective locations specified by the object map. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
Specification