Method and system for enterprise-wide retention of digital or electronic data

US 8,375,008 B1
Filed: 01/16/2004
Issued: 02/12/2013
Est. Priority Date: 01/17/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A method, performed by a computer system, for managing information associated with an enterprise, the method comprising the steps of:

obtaining original data from a plurality of different electronic data sources including at least two electronic data sources including a backup tape data source and a networked data source, wherein the original data includes a plurality of files having content portions and metadata portions;

determining email files within the original data;

extracting information from the email files using an email extraction engine;

forwarding the information extracted from the email files to a de-duplication engine;

separating the information extracted from the email files and other original data in content portions and metadata portions;

analyzing the content portions and the metadata portions by hashing the content portions of the information extracted and the other original data to form hashed values using a de-duplication engine and comparing the hashed values using the de-duplication engine;

placing, into a collective database, at least a single copy of unique content portions; and

placing, into a collective database, at least one copy of the metadata portions including information on where the files were obtained;

from a user of the computer system, receiving at least one rule, including;

a retention policy for the data; and

a query for the data, wherein the query is other than a search for duplicate portions of the data;

using the rule, which includes a logical operation to segregate the targeted data and the compliant data from other data, identifying;

compliant data that comply with the retention policy; and

targeted data that correspond to the query; and

using the rule, preserving the compliant data and the targeted data within the collective database, while deleting at least a portion of other data that are neither compliant data nor targeted data within the collective database.

View all claims

16 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for managing data can be used to provide a comprehensive solution to retaining electronic data within an enterprise. Data may come from backup tapes or a network. Email files may be separated from other files on the backup tape. Data from the email files may be extracted and fed to a collective database. The other files (from the file backup tapes) and data from the network are processed by a de-duplication engine to remove duplicates of the content while keeping metadata from each copy of the content. The content and metadata are forwarded to the collective database. Filters or other rules may be applied to the collective database to identify compliant or targeted data. Many different operations can then be performed on the compliant and targeted data.

285 Citations

47 Claims

1. A method, performed by a computer system, for managing information associated with an enterprise, the method comprising the steps of:
- obtaining original data from a plurality of different electronic data sources including at least two electronic data sources including a backup tape data source and a networked data source, wherein the original data includes a plurality of files having content portions and metadata portions;
  
  determining email files within the original data;
  
  extracting information from the email files using an email extraction engine;
  
  forwarding the information extracted from the email files to a de-duplication engine;
  
  separating the information extracted from the email files and other original data in content portions and metadata portions;
  
  analyzing the content portions and the metadata portions by hashing the content portions of the information extracted and the other original data to form hashed values using a de-duplication engine and comparing the hashed values using the de-duplication engine;
  
  placing, into a collective database, at least a single copy of unique content portions; and
  
  placing, into a collective database, at least one copy of the metadata portions including information on where the files were obtained;
  
  from a user of the computer system, receiving at least one rule, including;
  
  a retention policy for the data; and
  
  a query for the data, wherein the query is other than a search for duplicate portions of the data;
  
  using the rule, which includes a logical operation to segregate the targeted data and the compliant data from other data, identifying;
  
  compliant data that comply with the retention policy; and
  
  targeted data that correspond to the query; and
  
  using the rule, preserving the compliant data and the targeted data within the collective database, while deleting at least a portion of other data that are neither compliant data nor targeted data within the collective database.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, further comprising storing the targeted data in a persistent memory source.
  - 3. The method of claim 2, further comprising performing the steps of claim 1 and claim 2 on a periodic basis on different data sources.
  - 4. The method of claim 1, wherein the data source comprises a network, and further comprising deleting the other data from the network.
  - 5. The method of claim 4, wherein the data source comprises a backup tape.
  - 6. The method of claim 5, further comprising deleting the backup tape.
  - 7. The method of claim 5, wherein extracting the data comprises extracting the data from the backup tape without recreating a backup environment used for generating the backup tape.
  - 8. The method of claim 1, wherein the query for the data includes a query of at least one of the following for the data:
    - subject matter, author, type, and content.
  - 9. The method of claim 1, wherein the data include files having content portions and metadata portions, and wherein the files can be different file types, and, prior to collecting the data within the database, further comprising the steps of:
    - analyzing the content portions;
      
      placing, into the database, less than the total number of copies of unique content portions per file type; and
      
      placing, into the database, at least one copy of the metadata portions including information regarding from where the files were obtained.
  - 10. The method of claim 1, further comprising the step of:
    - auditing the database to determine data existing in the database that meet or do not meet a certain rule.
  - 11. The method of claim 1, further comprising the step of:
    - outputting a subset of the data in the database to a different database.
  - 12. The method of claim 1, wherein the extracting operation includes extracting new data from another data source.

13. A method, performed by a one or more computer systems, for managing information associated with an enterprise, the method comprising the steps of:
- extracting specific data from original data obtained from at least two different data sources including a backup tape data source and a networked data source, wherein the specific data contains at least a content portion and a metadata portion, by first separating email data from the original data and extracting information from the email data and forwarding the information to a de-duplication engine;
  
  separating the content portion from the metadata portion for the information from the email data and other original data;
  
  analyzing the content portions;
  
  placing, into a collective database, at least a single copy of unique content portions;
  
  placing, into the collective database, at least one copy of the metadata portions including information on where the files were obtained;
  
  collecting the specific data within a collective database, wherein the collective database includes one copy of the content portion of the specific data and different copies of the metadata portion including information regarding from where the specific data was obtained;
  
  from a user of the computer system, receiving at least one rule, including;
  
  a retention policy for the data; and
  
  a query for the data, wherein the query is other than a search for duplicate portions of the data;
  
  using the rule, identifying;
  
  compliant data that comply with the retention policy; and
  
  targeted data that correspond to the query; and
  
  using the rule, preserving the compliant data and the targeted data within the database, while deleting at least a portion of other data that are neither compliant data nor targeted data within the database.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The method of claim 13, wherein extracting the data comprises extracting the data from the backup tape without recreating a backup environment used for generating the backup tape.
  - 15. The method of claim 13, further comprising storing the targeted data in a persistent memory source.
  - 16. The method of claim 13, wherein the at least two different data sources comprise a network, and further comprising deleting the other data from the network.
  - 17. The method of claim 13, wherein the query for the data includes a query of at least one of the following for the data:
    - subject matter, author, type, and content.

18. A method, performed by a computer system, for managing information associated with at least one enterprise, the method comprising:
- extracting data from at least two data sources, a backup tape data source and a networked data source, wherein the extracted data include files have content portions and associated metadata portions;
  
  determining email files within the original data;
  
  extracting information from the email files using an email extraction engine;
  
  forwarding the information extracted from the email files to a de-duplication engine;
  
  separating the information extracted from the email files and other original data in content portions and metadata portions;
  
  analyzing the content portions and the metadata portions by hashing the content portions of the information extracted and the other original data to form hashed values using a de-duplication engine and comparing the hashed values using the de-duplication engine;
  
  placing, into a collective database, at least a single copy of unique content portions;
  
  placing, into the collective database, at least one copy of the metadata portions including information on where the files were obtained;
  
  storing unique content portions in the collective database;
  
  storing the metadata portions in the collective database including information linking the metadata portions to their associated content portions;
  
  from a user of the computer system, receiving at least one rule, including;
  
  a retention policy for the data; and
  
  a query for the data, wherein the query is other than a search for duplicate portions of the data;
  
  using the rule, identifying;
  
  compliant data that comply with the retention policy; and
  
  targeted data that correspond to the query; and
  
  using the rule, preserving the compliant data and the targeted data within the collective database, while deleting at least a portion of other data that are neither compliant data nor targeted data within the collective database.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
- - 19. The method of claim 18, wherein extracting the data comprises extracting the data from the backup tape without recreating a backup environment used for generating the backup tape.
  - 20. The method of claim 18, further comprising storing the targeted data in a different database.
  - 21. The method of claim 18, wherein the extracting includes extracting new data from another data source.
  - 22. The method of claim 18, and further comprising the step of:
    - deleting the other data from the networked data source.
  - 23. The method of claim 18, wherein the query for the data includes a query of at least one of the following for the data:
    - subject matter, author, type, and content.
  - 24. The method of claim 18, further comprising auditing the collective database to determine data existing in the collective database that meet or do not meet a certain rule.
  - 25. The method of claim 18, further comprising performing the steps of claim 24 on the data source on a periodic basis.

26. A method utilizing a computer for executing instructions stored on a computer readable medium having software code embodied therein for managing information, the software code comprising computer-executable instructions for:
- extracting data from at least two data sources, a first backup tape data source and a second networked data source;
  
  determining email files within the original data;
  
  extracting information from the email files using an email extraction engine;
  
  forwarding the information extracted from the email files to a de-duplication engine;
  
  separating the information extracted from the email files and other original data in content portions and metadata portions;
  
  analyzing the content portions;
  
  placing, into a collective database, a single copy of unique content portions; and
  
  placing, into a collective database, at least one copy of the metadata portions including information regarding from where the files were obtained;
  
  from a user of the computer system, receiving at least one rule, including;
  
  a retention policy for the data; and
  
  a query for the data, wherein the query is other than a search for duplicate portions of the data;
  
  using the rule, identifying;
  
  compliant data that comply with the retention policy; and
  
  targeted data that correspond to the query;
  
  storing the targeted data in a persistent memory source; and
  
  using the rule, preserving the compliant data and the targeted data within the database, while deleting at least a portion of other data that are neither compliant data nor targeted data within the database.
- View Dependent Claims (27, 28, 29, 30, 31, 32, 33)
- - 27. The method of claim 26, wherein the software code further comprises at least one instruction for performing the steps of claim 26 on a periodic basis on different data sources.
  - 28. The method of claim 26, wherein the software code further comprises at least one instruction for deleting the other data from the backup tape data source, the networked data source or both.
  - 29. The method of claim 26, wherein the data source comprises a backup tape, and wherein extracting the data comprises extracting the data from the backup tape without recreating a backup environment used for generating the backup tape.
  - 30. The method of claim 26, wherein:
    - the rule is a query; and
      
      the targeted data comprise a portion of the data within the database, wherein the targeted data correspond to the query.
  - 31. The method of claim 26, wherein the data include files having content portions and metadata portions, and wherein the files can be different file types, and wherein the software code further comprises at least one instruction for, prior to collecting the data within the database:
    - analyzing the content portions;
      
      placing, into the database, less than the total number of copies of unique content portions per file type; and
      
      placing, into the database, at least one copy of the metadata portions including information regarding from where the files were obtained.
  - 32. The method of claim 26, wherein the software code further comprises at least one instruction for outputting a subset of the data in the database to a different database.
  - 33. The method of claim 26, wherein the extracting step includes extracting new data from another data source.

34. A method for executing operations in accordance with instructions stored on a computer readable medium having software code embodied therein for managing information, the software code comprising computer-executable instructions for:
- extracting original data from at least two different data sources, a backup tape data source and a networked data source and another data source;
  
  determining email files within the original data;
  
  extracting information from the email files using an email extraction engine;
  
  forwarding the information extracted from the email files to a de-duplication engine;
  
  separating the information extracted from the email files and other original data in content portions and metadata portions;
  
  analyzing the content portions;
  
  placing, into a collective database, a single copy of unique content portions; and
  
  placing, into a collective database, at least one copy of the metadata portions including information regarding from where the files were obtained;
  
  collecting the data within a collective database;
  
  from a user of the computer system, receiving at least one rule, including;
  
  a retention policy for the data; and
  
  a query for the data, wherein the query for the data includes a query of at least one of the following for the data;
  
  subject matter, author, type, and content;
  
  using the rule, identifying;
  
  compliant data that comply with the retention policy; and
  
  targeted data that correspond to the query; and
  
  using the rule, preserving the compliant data and the targeted data within the database, while deleting at least a portion of other data that are neither compliant data nor targeted data within the database.
- View Dependent Claims (35, 36, 37, 38)
- - 35. The method of claim 34, wherein extracting the data comprises extracting the data from the backup tape without recreating a backup environment used for generating the backup tape.
  - 36. The method of claim 34, wherein the software code further comprises at least one instruction for storing the targeted data in a persistent memory source.
  - 37. The method of claim 34, wherein the at least two different data sources comprise a network, and wherein the software code further comprises at least one instruction for deleting the other data from the network.
  - 38. The method of claim 34, wherein:
    - specific data include a content portion and a metadata portion; and
      
      the database comprises;
      
      one copy of the content portion of the specific data; and
      
      different copies of the metadata portion including information regarding from where the specific data was obtained.

39. A method of executing operations based on a computer readable medium having software code embodied therein for managing information, the software code comprising computer-executable instructions for:
- extracting data from at least two different data sources, wherein the extracted data include files having content portions and associated metadata portions;
  
  determining email files within the original data;
  
  extracting information from the email files using an email extraction engine;
  
  forwarding the information extracted from the email files to a de-duplication engine;
  
  separating the information extracted from the email files and other original data in content portions and metadata portions;
  
  storing unique content portions in a collective database;
  
  storing the metadata portions in the collective database including information linking the metadata portions to their associated content portions;
  
  from a user of the computer system, receiving at least one rule, including;
  
  a retention policy for the data; and
  
  a query for the data, wherein the query is other than a search for duplicate portions of the data;
  
  using the rule, identifying;
  
  compliant data that comply with the retention policy; and
  
  targeted data that correspond to the query; and
  
  using the rule, preserving the compliant data and the targeted data within the collective database, while deleting at least a portion of other data that are neither compliant data nor targeted data within the collective database.
- View Dependent Claims (40, 41, 42, 43, 44, 45, 46, 47)
- - 40. The method of claim 39, wherein:
    - the data source comprises at least two data sources, wherein a first data source is at least one backup tape and a second data source is a network.
  - 41. The method of claim 40, wherein extracting the data comprises extracting the data from the backup tape without recreating a backup environment used for generating the backup tape.
  - 42. The method of claim 39, wherein the software code further comprises at least one instruction for storing the targeted data a different database.
  - 43. The method of claim 39, wherein the extracting includes extracting new data from another data source.
  - 44. The method of claim 39, wherein the data source comprises at least two data sources, wherein a first data source is one or more backup tapes and a second data source is the networked database, and wherein the software code further comprises at least one instruction for deleting the other data from the network.
  - 45. The method of claim 39, wherein the software code further comprises at least one instruction for auditing the collective database to determine data existing in the collective database that meet or do not meet a certain rule.
  - 46. The method of claim 39, wherein the software code further comprises at least one instruction for performing the steps of claim 39 on the data source on a periodic basis.
  - 47. The method of claim 39, wherein the query for the data includes a query of at least one of the following for the data:
    - subject matter, author, type, and content.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
LDiscovery LLC
Original Assignee
Comerica Bank (Comerica Incorporated)
Inventors
Gomes, Robert
Primary Examiner(s)
Trujillo, James
Assistant Examiner(s)
Black, Linh

Application Number

US10/759,622
Time in Patent Office

3,315 Days
Field of Search

707 1-1041, 707/200, 707/204, 707/687, 707/694
US Class Current

707/694
CPC Class Codes

G06F 16/10 File systems; File servers

Method and system for enterprise-wide retention of digital or electronic data

First Claim

16 Assignments

0 Petitions

Accused Products

Abstract

285 Citations

47 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for enterprise-wide retention of digital or electronic data

First Claim

16 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

285 Citations

47 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links