Storage-network de-duplication
First Claim
1. A system comprising:
- one or more processors;
a de-duplicated repository of a storage management system, coupled to the one or more processors; and
a de-duplication logic of the storage management system, coupled to the one or more processors and to the de-duplicated repository, wherein the de-duplicated logic is operable to store files in the de-duplicated repository using a single storage encoding and to;
receive, from a client device over a network, a first request to store a file in the de-duplicated repository, wherein the first request includes an identifier of the file and a set of signatures that respectively identify a set of chunks from the file, wherein the client device is remote from the storage management system;
look up the set of signatures in the de-duplicated repository to determine whether any chunks in the set of chunks are not stored in the de-duplicated repository;
request, from the client device, those chunks from the set of chunks that are not stored in the de-duplicated repository;
for each chunk from the set of chunks that is not stored in the de-duplicated repository, store in the de-duplicated repository using the single storage encoding at least the chunk and a signature, from the set of signatures, that represents the chunk; and
store, in the de-duplicated repository, a file entry that represents the file and that associates the set of signatures with the identifier of the file,wherein the de-duplicated repository is stored on physical disk blocks that have a fixed size, and wherein the set of chunks are generated using a fingerprinting logic that is configured to generate variable-sized chunks in a manner that is dependent on the fixed size, such that each variable-sized chunk is no larger than the fixed size.
3 Assignments
0 Petitions
Accused Products
Abstract
Techniques are provided for de-duplication of data. In one embodiment, a system comprises de-duplication logic that is coupled to a de-duplication repository. The de-duplication logic is operable to receive, from a client device over a network, a request to store a file in the de-duplicated repository using a single storage encoding. The request includes a file identifier and a set of signatures that identify a set of chunks from the file. The de-duplication logic determines whether any chunks in the set are missing from the de-duplicated repository and requests the missing chunks from the client device. Then, for each missing chunk, the de-duplication logic stores in the de-duplicated repository that chunk and a signature representing that chunk. The de-duplication logic also stores, in the de-duplicated repository, a file entry that represents the file and that associates the set of signatures with the file identifier.
-
Citations
33 Claims
-
1. A system comprising:
-
one or more processors; a de-duplicated repository of a storage management system, coupled to the one or more processors; and a de-duplication logic of the storage management system, coupled to the one or more processors and to the de-duplicated repository, wherein the de-duplicated logic is operable to store files in the de-duplicated repository using a single storage encoding and to; receive, from a client device over a network, a first request to store a file in the de-duplicated repository, wherein the first request includes an identifier of the file and a set of signatures that respectively identify a set of chunks from the file, wherein the client device is remote from the storage management system; look up the set of signatures in the de-duplicated repository to determine whether any chunks in the set of chunks are not stored in the de-duplicated repository; request, from the client device, those chunks from the set of chunks that are not stored in the de-duplicated repository; for each chunk from the set of chunks that is not stored in the de-duplicated repository, store in the de-duplicated repository using the single storage encoding at least the chunk and a signature, from the set of signatures, that represents the chunk; and store, in the de-duplicated repository, a file entry that represents the file and that associates the set of signatures with the identifier of the file, wherein the de-duplicated repository is stored on physical disk blocks that have a fixed size, and wherein the set of chunks are generated using a fingerprinting logic that is configured to generate variable-sized chunks in a manner that is dependent on the fixed size, such that each variable-sized chunk is no larger than the fixed size. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 33)
-
-
15. One or more non-transitory storage media storing instructions executable by one or more computing devices, the instructions comprising:
-
instructions that cause the one or more computing devices to receive, from a client device over a network, a first request to store a file in a de-duplicated repository of a storage management system, using a single storage encoding, wherein the first request includes an identifier of the file and a set of signatures that respectively identify a set of chunks from the file, wherein the client device is remote from the storage management system; instructions that cause the one or more computing devices to look up the set of signatures in the de-duplicated repository to determine whether any chunks in the set of chunks are not stored in the de-duplicated repository; instructions that cause the one or more computing devices to request, from the client device, those chunks from the set of chunks that are not stored in the de-duplicated repository; for each chunk from the set of chunks that is not stored in the de-duplicated repository, instructions that cause the one or more computing devices to store in the de-duplicated repository using the single storage encoding at least the chunk and a signature, from the set of signatures, that represents the chunk; and instructions that cause the one or more computing devices to store, in the de-duplicated repository, a file entry that represents the file and that associates the set of signatures with the identifier of the file, wherein the de-duplicated repository is stored on physical disk blocks that have a fixed size, and wherein the set of chunks are generated using a fingerprinting logic that is configured to generate variable-sized chunks in a manner that is dependent on the fixed size, such that each variable-sized chunk is no larger than the fixed size. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A method comprising:
-
receiving, from a client device over a network, a first request to store a file in a de-duplicated repository of a storage management system, using a single storage encoding, wherein the first request includes an identifier of the file and a set of signatures that respectively identify a set of chunks from the file, and wherein the client device is remote from the storage management system; looking up the set of signatures in the de-duplicated repository to determine whether any chunks in the set of chunks are not stored in the de-duplicated repository; requesting, from the client device, those chunks from the set of chunks that are not stored in the de-duplicated repository; for each chunk from the set of chunks that is not stored in the de-duplicated repository, storing in the de-duplicated repository using the single storage encoding at least the chunk and a signature, from the set of signatures, that represents the chunk; and storing, in the de-duplicated repository, a file entry that represents the file and that associates the set of signatures with the identifier of the file, wherein the de-duplicated repository is stored on physical disk blocks that have a fixed size, and wherein the set of chunks are generated using a fingerprinting logic that is configured to generate variable-sized chunks in a manner that is dependent on the fixed size, such that each variable-sized chunk is no larger than the fixed size. - View Dependent Claims (30, 31, 32)
-
Specification