System and method for indexing electronic discovery data

US 8,924,395 B2
Filed: 10/06/2011
Issued: 12/30/2014
Est. Priority Date: 10/06/2010
Status: Active Grant

First Claim

Patent Images

1. A method, comprising the steps of:

(a) determining a file type using a file typer for each document in a plurality of documents;

(b) extracting data using a file ripper from each document typed in step (a), wherein the data is selected from the group consisting of metadata, text, records, tables, images, pictures, auditory data or combinations thereof;

(c) testing the data extracted in step (b) for an embedded object, and if one or more embedded objects are detected, appending the data from the embedded objects to a buffer wherein the data in the documents;

(d) repeating steps (a) to (c) recursively for each document until no additional embedded objects are detected in the documents;

(e) repeating steps (a) to (d) iteratively for each document in the plurality of documents;

(f) creating an object map which preserves respective locations of the embedded objects within each document in the plurality of documents;

(g) replacing the embedded objects within each document of the documents with text, where the spatial relationship between the text at the location and the surrounding text is specified by the object map; and

(h) generating an index which preserves hierarchial relationships among the documents and the embedded objects, where the documents and the embedded objects have at least one individual identifier associated with each of the documents and the embedded objects.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for efficiently processing electronically stored information (ESI) are described. The systems and methods describe processing ESI in preparation for, or association with, litigation. The invention preserves the contextual relationships among documents when processing and indexing data, allowing for increased precision and recall during data analytics.

259 Citations

23 Claims

1. A method, comprising the steps of:
- (a) determining a file type using a file typer for each document in a plurality of documents;
  
  (b) extracting data using a file ripper from each document typed in step (a), wherein the data is selected from the group consisting of metadata, text, records, tables, images, pictures, auditory data or combinations thereof;
  
  (c) testing the data extracted in step (b) for an embedded object, and if one or more embedded objects are detected, appending the data from the embedded objects to a buffer wherein the data in the documents;
  
  (d) repeating steps (a) to (c) recursively for each document until no additional embedded objects are detected in the documents;
  
  (e) repeating steps (a) to (d) iteratively for each document in the plurality of documents;
  
  (f) creating an object map which preserves respective locations of the embedded objects within each document in the plurality of documents;
  
  (g) replacing the embedded objects within each document of the documents with text, where the spatial relationship between the text at the location and the surrounding text is specified by the object map; and
  
  (h) generating an index which preserves hierarchial relationships among the documents and the embedded objects, where the documents and the embedded objects have at least one individual identifier associated with each of the documents and the embedded objects.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein the embedded objects contain at least one additional embedded object.
  - 3. the method of claim 1, wherein text of a visual representation of an embedded object within the document is preserved both in substance and location with respect to text in document.
  - 4. The method of claim 1, wherein for every file type there is an individual, corresponding extraction.
  - 5. The method of claim 2, further comprising repeating steps (a) to (e) recursively for all the embedded objects, and if at least one additional embedded object is detected, performing an extraction of the additional embedded objects until no additional embedded objects are detected.

6. A computer system for reviewing data, the computer system comprising:
- (a) a source of a plurality of electronic documents, the source provided by a non-transitory computer-readable storage medium;
  
  (b) a software sub-routine comprising a file typer, the software sub-routine embodied as software instructions, which when executed by a processor, cause the processor to determine the physical file type of a document;
  
  (c) a software sub-routine comprising a file ripper, the software sub-routine embodied as software instructions, which when executed by a processor, cause the processor to extract data from at least one document from the plurality of electronic documents, wherein the data is selected from the group consisting of metadata, text, records, tables, images, pictures, auditory data or combinations thereof;
  
  (i) wherein the file ripper tests each document for linked or embedded objects, and(ii) wherein the file ripper recursively repeats step (i) if additional linked or embedded objects are detected;
  
  (d) an index comprising data from the documents and embedded objects wherein the index preserves hierarchical relationships among the embedded objects and wherein each embedded object has at least one individual identifier showing the respective locations of the embedded objects within the documents preserved through the use of an object map, and(e) a software sub-routine embodied as software instructions, which when executed by the processor, cause the processor to replace the embedded objects within the documents with text corresponding to the embedded objects the text inserted at the respective locations specified by the object map.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 7. The computer system of claim 6, wherein the index is stored in a buffer on a computer-readable storage medium capable of receiving requests for specific data characteristics and identifying documents or embedded objects with those characteristics.
  - 8. The computer system of claim 6, further comprising a library housing methods of extraction for all file types respectively for documents and embedded objects.
  - 9. The computer system of claim 6, wherein the computer system is used for preparation of documents which are to be reviewed in connection with a litigation.
  - 10. The computer system of claim 6, wherein the computer system is located within a plurality of servers, processors and storage media in communication over a network.
  - 11. The computer system of claim 6, wherein the computer system comprises a terminal for accepting user input or displaying data processed by a computer-readable storage medium.
  - 12. The computer system of claim 6, wherein the source of a plurality of electronic documents is in communication with other components of the computer system via the Internet.
  - 13. The computer system of claim 6, wherein each document is selected from the group consisting of a text file, an image, and a spreadsheet.
  - 14. The computer system of claim 6, wherein each embedded object is selected from the group consisting of a text file, an image, and a spreadsheet.
  - 15. The computer system of claim 6, further comprising a first processor capable of receiving at least one document extracting data from said document, and recursively searching said document for linked or embedded objects.
  - 16. The computer system of claim 15, further comprising a first computer-readable storage medium capable of containing all extracted data, and featuring a buffer with data for each linked or embedded object indexed separately, wherein each of a substance, location and textual relationship of each linked or embedded object and the document is preserved.
  - 17. The computer system of claim 15, further comprising a second processor in communication with the computer-readable storage medium capable of receiving requests for specific data characteristics and identifying the documents or embedded objects with those characteristics.
  - 18. The computer system of claim 15, further comprising a second computer-readable storage medium containing a library of all individual programs for all file types.
  - 19. The computer system of claim 15, wherein the second computer-readable storage medium is in communication with the processor.
  - 20. The computer system of claim 15, wherein the computer system is used for the preparation of documents in anticipation of litigation.
  - 21. The computer system of claim 15, wherein the computer system is located within a plurality of servers, processors and computer-readable storage media in communication over a network.
  - 22. The computer system of claim 15, wherein the computer system comprises a terminal for accepting user input or displaying extracted data processed by the first and second computer-readable storage media.
  - 23. The computer system of claim 15, wherein the source of a plurality of electronic documents is in communication with other components of the computer system via the Internet.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Planet Data Solutions Incorporated
Original Assignee
Planet Data Solutions Incorporated
Inventors
Wade, Michael, Nelson, Robert
Primary Examiner(s)
Alam, Hosain
Assistant Examiner(s)
ABRAHAM, AHMED M

Application Number

US13/267,800
Publication Number

US 20120265762A1
Time in Patent Office

1,181 Days
Field of Search

707/705, 707/706, 707/707, 707/740, 707/709, 707/758, 707/777, 707/778, 709/707, 715/713, 715/234, 717/155, 717/100
US Class Current

707/741
CPC Class Codes

G06F 16/22   Indexing; Data structures t...

G06F 16/93   Document management systems

G06Q 50/184   Intellectual property manag...

G06V 30/224   of printed characters havin...

System and method for indexing electronic discovery data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

259 Citations

23 Claims

Specification

Use Cases

Quick Links

Others

System and method for indexing electronic discovery data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

259 Citations

23 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others