Method and system for enterprise-wide retention of digital or electronic data
First Claim
1. A method, performed by a computer system, for managing information associated with an enterprise, the method comprising the steps of:
- obtaining original data from a plurality of different electronic data sources including at least two electronic data sources including a backup tape data source and a networked data source, wherein the original data includes a plurality of files having content portions and metadata portions;
determining email files within the original data;
extracting information from the email files using an email extraction engine;
forwarding the information extracted from the email files to a de-duplication engine;
separating the information extracted from the email files and other original data in content portions and metadata portions;
analyzing the content portions and the metadata portions by hashing the content portions of the information extracted and the other original data to form hashed values using a de-duplication engine and comparing the hashed values using the de-duplication engine;
placing, into a collective database, at least a single copy of unique content portions; and
placing, into a collective database, at least one copy of the metadata portions including information on where the files were obtained;
from a user of the computer system, receiving at least one rule, including;
a retention policy for the data; and
a query for the data, wherein the query is other than a search for duplicate portions of the data;
using the rule, which includes a logical operation to segregate the targeted data and the compliant data from other data, identifying;
compliant data that comply with the retention policy; and
targeted data that correspond to the query; and
using the rule, preserving the compliant data and the targeted data within the collective database, while deleting at least a portion of other data that are neither compliant data nor targeted data within the collective database.
16 Assignments
0 Petitions
Accused Products
Abstract
A method and system for managing data can be used to provide a comprehensive solution to retaining electronic data within an enterprise. Data may come from backup tapes or a network. Email files may be separated from other files on the backup tape. Data from the email files may be extracted and fed to a collective database. The other files (from the file backup tapes) and data from the network are processed by a de-duplication engine to remove duplicates of the content while keeping metadata from each copy of the content. The content and metadata are forwarded to the collective database. Filters or other rules may be applied to the collective database to identify compliant or targeted data. Many different operations can then be performed on the compliant and targeted data.
285 Citations
47 Claims
-
1. A method, performed by a computer system, for managing information associated with an enterprise, the method comprising the steps of:
-
obtaining original data from a plurality of different electronic data sources including at least two electronic data sources including a backup tape data source and a networked data source, wherein the original data includes a plurality of files having content portions and metadata portions; determining email files within the original data; extracting information from the email files using an email extraction engine; forwarding the information extracted from the email files to a de-duplication engine; separating the information extracted from the email files and other original data in content portions and metadata portions; analyzing the content portions and the metadata portions by hashing the content portions of the information extracted and the other original data to form hashed values using a de-duplication engine and comparing the hashed values using the de-duplication engine; placing, into a collective database, at least a single copy of unique content portions; and placing, into a collective database, at least one copy of the metadata portions including information on where the files were obtained; from a user of the computer system, receiving at least one rule, including;
a retention policy for the data; and
a query for the data, wherein the query is other than a search for duplicate portions of the data;using the rule, which includes a logical operation to segregate the targeted data and the compliant data from other data, identifying;
compliant data that comply with the retention policy; and
targeted data that correspond to the query; andusing the rule, preserving the compliant data and the targeted data within the collective database, while deleting at least a portion of other data that are neither compliant data nor targeted data within the collective database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method, performed by a one or more computer systems, for managing information associated with an enterprise, the method comprising the steps of:
-
extracting specific data from original data obtained from at least two different data sources including a backup tape data source and a networked data source, wherein the specific data contains at least a content portion and a metadata portion, by first separating email data from the original data and extracting information from the email data and forwarding the information to a de-duplication engine; separating the content portion from the metadata portion for the information from the email data and other original data; analyzing the content portions; placing, into a collective database, at least a single copy of unique content portions; placing, into the collective database, at least one copy of the metadata portions including information on where the files were obtained; collecting the specific data within a collective database, wherein the collective database includes one copy of the content portion of the specific data and different copies of the metadata portion including information regarding from where the specific data was obtained; from a user of the computer system, receiving at least one rule, including;
a retention policy for the data; and
a query for the data, wherein the query is other than a search for duplicate portions of the data;using the rule, identifying;
compliant data that comply with the retention policy; and
targeted data that correspond to the query; andusing the rule, preserving the compliant data and the targeted data within the database, while deleting at least a portion of other data that are neither compliant data nor targeted data within the database. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A method, performed by a computer system, for managing information associated with at least one enterprise, the method comprising:
-
extracting data from at least two data sources, a backup tape data source and a networked data source, wherein the extracted data include files have content portions and associated metadata portions; determining email files within the original data; extracting information from the email files using an email extraction engine; forwarding the information extracted from the email files to a de-duplication engine; separating the information extracted from the email files and other original data in content portions and metadata portions; analyzing the content portions and the metadata portions by hashing the content portions of the information extracted and the other original data to form hashed values using a de-duplication engine and comparing the hashed values using the de-duplication engine; placing, into a collective database, at least a single copy of unique content portions; placing, into the collective database, at least one copy of the metadata portions including information on where the files were obtained; storing unique content portions in the collective database; storing the metadata portions in the collective database including information linking the metadata portions to their associated content portions; from a user of the computer system, receiving at least one rule, including;
a retention policy for the data; and
a query for the data, wherein the query is other than a search for duplicate portions of the data;using the rule, identifying;
compliant data that comply with the retention policy; and
targeted data that correspond to the query; andusing the rule, preserving the compliant data and the targeted data within the collective database, while deleting at least a portion of other data that are neither compliant data nor targeted data within the collective database. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
-
-
26. A method utilizing a computer for executing instructions stored on a computer readable medium having software code embodied therein for managing information, the software code comprising computer-executable instructions for:
-
extracting data from at least two data sources, a first backup tape data source and a second networked data source; determining email files within the original data; extracting information from the email files using an email extraction engine; forwarding the information extracted from the email files to a de-duplication engine; separating the information extracted from the email files and other original data in content portions and metadata portions; analyzing the content portions; placing, into a collective database, a single copy of unique content portions; and placing, into a collective database, at least one copy of the metadata portions including information regarding from where the files were obtained; from a user of the computer system, receiving at least one rule, including;
a retention policy for the data; and
a query for the data, wherein the query is other than a search for duplicate portions of the data;using the rule, identifying;
compliant data that comply with the retention policy; and
targeted data that correspond to the query;storing the targeted data in a persistent memory source; and using the rule, preserving the compliant data and the targeted data within the database, while deleting at least a portion of other data that are neither compliant data nor targeted data within the database. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33)
-
-
34. A method for executing operations in accordance with instructions stored on a computer readable medium having software code embodied therein for managing information, the software code comprising computer-executable instructions for:
-
extracting original data from at least two different data sources, a backup tape data source and a networked data source and another data source; determining email files within the original data; extracting information from the email files using an email extraction engine; forwarding the information extracted from the email files to a de-duplication engine; separating the information extracted from the email files and other original data in content portions and metadata portions; analyzing the content portions; placing, into a collective database, a single copy of unique content portions; and placing, into a collective database, at least one copy of the metadata portions including information regarding from where the files were obtained; collecting the data within a collective database; from a user of the computer system, receiving at least one rule, including;
a retention policy for the data; and
a query for the data, wherein the query for the data includes a query of at least one of the following for the data;
subject matter, author, type, and content;using the rule, identifying;
compliant data that comply with the retention policy; and
targeted data that correspond to the query; andusing the rule, preserving the compliant data and the targeted data within the database, while deleting at least a portion of other data that are neither compliant data nor targeted data within the database. - View Dependent Claims (35, 36, 37, 38)
-
-
39. A method of executing operations based on a computer readable medium having software code embodied therein for managing information, the software code comprising computer-executable instructions for:
-
extracting data from at least two different data sources, wherein the extracted data include files having content portions and associated metadata portions; determining email files within the original data; extracting information from the email files using an email extraction engine; forwarding the information extracted from the email files to a de-duplication engine; separating the information extracted from the email files and other original data in content portions and metadata portions; storing unique content portions in a collective database; storing the metadata portions in the collective database including information linking the metadata portions to their associated content portions; from a user of the computer system, receiving at least one rule, including;
a retention policy for the data; and
a query for the data, wherein the query is other than a search for duplicate portions of the data;using the rule, identifying;
compliant data that comply with the retention policy; and
targeted data that correspond to the query; andusing the rule, preserving the compliant data and the targeted data within the collective database, while deleting at least a portion of other data that are neither compliant data nor targeted data within the collective database. - View Dependent Claims (40, 41, 42, 43, 44, 45, 46, 47)
-
Specification