DATA DEDUPLICATION IN A FILE SYSTEM
First Claim
1. An apparatus, comprising:
- a processor and a memory communicatively connected to the processor, the processor configured to;
receive a file comprising original file contents;
determine a set of data chunks of the original file contents of the file and a respective set of hash values of the data chunks;
determine whether the data chunks are stored in a data chunk store comprising a set of data chunks for one or more stored files;
encode the original file contents of the file, to form an encoded form of the original file contents of the file, based on the hash values of the data chunks;
compress the encoded form of the original file contents of the file to form a compressed and encoded form of the original file contents of the file; and
store the compressed and encoded form of the original file contents of the file.
6 Assignments
0 Petitions
Accused Products
Abstract
A data deduplication capability is presented. The data deduplication capability enables deduplication of data of a set of files, where the set of files may include files stored in network-based data storage elements and, optionally, files stored in one or more client devices which may communicate with the network-based data storage elements. The data deduplication capability may use one or more data deduplication techniques within files (for intra-file redundancy) or across files (for inter-file redundancy) in order to reduce or even minimize storage cost associated with storage of the files or bandwidth cost associated with transfers of the files. The data deduplication capability may use one or more data deduplication techniques in conjunction with one or more data compression techniques in order to reduce or even minimize storage cost associated with storage of the files or bandwidth cost associated with transfers of the files.
-
Citations
20 Claims
-
1. An apparatus, comprising:
a processor and a memory communicatively connected to the processor, the processor configured to; receive a file comprising original file contents; determine a set of data chunks of the original file contents of the file and a respective set of hash values of the data chunks; determine whether the data chunks are stored in a data chunk store comprising a set of data chunks for one or more stored files; encode the original file contents of the file, to form an encoded form of the original file contents of the file, based on the hash values of the data chunks; compress the encoded form of the original file contents of the file to form a compressed and encoded form of the original file contents of the file; and store the compressed and encoded form of the original file contents of the file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
10. A method, comprising:
using a processor and a memory for; receiving a file comprising original file contents; determining a set of data chunks of the original file contents of the file and a respective set of hash values of the data chunks; determining whether the data chunks are stored in a data chunk store comprising a set of data chunks for one or more stored files; encoding the original file contents of the file, to form an encoded form of the original file contents of the file, by removing the data chunks from the file and inserting the associated hash values of the data chunks into the file; compressing the encoded form of the original file contents of the file to form a compressed and encoded form of the original file contents of the file; and storing the compressed and encoded form of the original file contents of the file.
-
11. An apparatus, comprising:
a processor and a memory communicatively connected to the processor, the processor configured to; receive, from a data storage element, a file comprising metadata and file contents, wherein the file has original file contents associated therewith, wherein the original file contents comprise a set of data chunks; and based on a determination that the file does not include a reference to a reference file comprising a form of the original file contents of the file, initiate a process to ensure that the set of data chunks of the original file contents of the file is present within a data chunk store comprising data chunks for one or more stored files. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
Specification