Distributed deduplicated storage system
First Claim
1. A method of performing a storage operation in a distributed, deduplicated storage system, comprising:
- creating with a first deduplication node of a plurality of deduplication nodes, a first hash signature of a first data block of a plurality of data blocks associated with a file, a first header that at least identifies a first media agent that stored a copy of the first data block in a first storage device, and a first link to at least a location of the copy of the first data block in the first storage device;
creating with a second deduplication node, a second hash signature of at least a second data block associated with the file, a second header that at least identifies a second media agent that stored a copy of the second data block in a second storage device, and a second link to at least a location of the second data block in the second storage device;
sending from the second deduplication node to the first deduplication node, a copy of the second hash signature, a copy of the second header, and a copy of the second link;
receiving a first request from a client computing device to restore the file comprising the plurality of data blocks;
in response to the first request and using computer hardware, determining with the first deduplication node that the copy of the first data block of the plurality of data blocks in the requested file is stored at the first storage device;
accessing with the first media agent the first data block in the first storage device;
further determining with the first deduplication node that the copy of the second data block is stored on the second storage device based at least in part on accessing the copy of the second hash signature, the copy of the second header, and the copy of the second link stored in association with the first deduplication node;
sending a second request from the first media agent to the second media agent via a lightweight network that requests the second data block from the second media agent, wherein the second request comprises at least the copy of the second header, and the copy of second link; and
accessing with the second media agent, the second data block from the second storage device based at least in part on the copy of the second header and the copy of the second link in the second request.
4 Assignments
0 Petitions
Accused Products
Abstract
A distributed, deduplicated storage system according to certain embodiments is arranged in a parallel configuration including multiple deduplication nodes. Deduplicated data is distributed across the deduplication nodes. The deduplication nodes can be networked together and communicate with one another according using a light-weight, customized communication scheme (e.g., a scheme based on FTP or HTTP). In some cases, deduplication management information including deduplication signatures and/or other metadata is stored separately from the deduplicated data in deduplication management nodes, improving performance and scalability.
-
Citations
18 Claims
-
1. A method of performing a storage operation in a distributed, deduplicated storage system, comprising:
-
creating with a first deduplication node of a plurality of deduplication nodes, a first hash signature of a first data block of a plurality of data blocks associated with a file, a first header that at least identifies a first media agent that stored a copy of the first data block in a first storage device, and a first link to at least a location of the copy of the first data block in the first storage device; creating with a second deduplication node, a second hash signature of at least a second data block associated with the file, a second header that at least identifies a second media agent that stored a copy of the second data block in a second storage device, and a second link to at least a location of the second data block in the second storage device; sending from the second deduplication node to the first deduplication node, a copy of the second hash signature, a copy of the second header, and a copy of the second link; receiving a first request from a client computing device to restore the file comprising the plurality of data blocks; in response to the first request and using computer hardware, determining with the first deduplication node that the copy of the first data block of the plurality of data blocks in the requested file is stored at the first storage device; accessing with the first media agent the first data block in the first storage device; further determining with the first deduplication node that the copy of the second data block is stored on the second storage device based at least in part on accessing the copy of the second hash signature, the copy of the second header, and the copy of the second link stored in association with the first deduplication node; sending a second request from the first media agent to the second media agent via a lightweight network that requests the second data block from the second media agent, wherein the second request comprises at least the copy of the second header, and the copy of second link; and accessing with the second media agent, the second data block from the second storage device based at least in part on the copy of the second header and the copy of the second link in the second request. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A distributed deduplicated storage system, comprising:
-
a plurality of deduplication nodes each comprising one or more processors and storage, the deduplication nodes in communication with one another via a network and a plurality of data blocks corresponding to a plurality of deduplicated files distributed across the deduplication nodes, a first deduplication node of the plurality of deduplication nodes creates a first hash signature of a first data block of the plurality of data blocks associated with a file, a first header that at least identifies a first media agent that stored a copy of the first data block in a first storage device, and a first link to at least a location of the copy of the first data block in the first storage device and a second deduplication node of the plurality of deduplication nodes creates a second hash signature of at least a second data block associated with the file, a second header that at least identifies a second media agent that stored a copy of the second data block in a second storage device, and a second link to at least a location of the second data block in the second storage device, wherein the second deduplication node sends a copy of second hash signature, the second header, and the second link to the first deduplication node; computer hardware configured to; receive a request for the file comprised of a plurality of data blocks; in response to the request, determine with the first deduplication node that the copy of the first data block of the plurality of data blocks exists at the first storage device; access with the first media agent the particular data block from the first storage device based at least in part on the first header and the first link stored in association with the first deduplication node; determine with the first deduplication node that the copy of the second data block exists at the second storage device based at least in part on the copy of the second hash signature, the copy of the second header, and the copy of the second link stored in association with the first deduplication node; and sending a second request from the first media agent to the second media agent via a lightweight network to obtain the second data block from the second storage device, the second media agent accesses the second data block from the second storage device based at least in part on the copy of the second header and the copy of the second link in the second request. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification