SYSTEM AND METHOD FOR CREATING A DE-DUPLICATED DATA SET AND PRESERVING METADATA FOR PROCESSING THE DE-DUPLICATED DATA SET
First Claim
Patent Images
1. A method for de-duplicating and storing data, comprising the steps of:
- reading the contents of a data file;
generating a hash value for the data file;
comparing the hash value with existing hash values;
storing the data file if its hash value does not match an existing hash value;
extracting metadata from the stored data file; and
storing the metadata and associating the metadata with the data file'"'"'s hash value such that the metadata can be queried to identify the corresponding data file.
6 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a system and method for de-duplicating a large heterogeneous stock of data and collecting metadata associated with that data. Additionally, the system and method provide a means for retrieving data items based on specific criteria that can be identified in the collected metadata.
-
Citations
2 Claims
-
1. A method for de-duplicating and storing data, comprising the steps of:
-
reading the contents of a data file; generating a hash value for the data file; comparing the hash value with existing hash values; storing the data file if its hash value does not match an existing hash value; extracting metadata from the stored data file; and storing the metadata and associating the metadata with the data file'"'"'s hash value such that the metadata can be queried to identify the corresponding data file.
-
-
2. A system for de-duplicating and storing data, comprising:
-
at least one pod adapted to read the content of a data file and generate a hash value corresponding to the data file; a file system in communication with the at least one pod, adapted to store the data file and its hash value if its hash value does not match the hash value of a data file already stored in the file system; and a database system in communication with the at least one pod and the file system, wherein the database system is adapted to receive and process metadata corresponding to the data file stored on the file system, and wherein the database stores the metadata and associates the metadata with the data file'"'"'s hash value such that the metadata can be queried to identify the corresponding data file.
-
Specification